On ego and sharing ideas

Line Charts, Models
A behavioral Nobel means a behavioral blog post

In honor of Richard Thaler’s recent Nobel prize win, I give you a post on a behavioral economics topic! Welcome to round 2 of A G2 Talks Models. Today’s topic: ego utility and the decision to speak up in class à la Koszegi 2006, an approach to belief-based utility adapted from Psychology and Economics lecture.

An admission of ego

I have a confession to make. I want you to think I’m smart. There, I said it. It is important to my self-image that you (yes, you on the other side of this screen!), my academic peers, and even the man (boy?) who tipsily mansplained the Monty Hall problem to me perceive me as intelligent. That is, intelligence as signaled by the occasional insightful comment, deep question, or quality idea that I get up the nerve to share.

Classrooms are environments in which lots of signaling of such smarts takes place. Professors ask questions, both rhetorical and not. They let us marinate in pregnant pauses and make a call for ideas. There’s a beat in which the tiny neuron-bureaucrat who is tasked with managing and organizing my brain activity files through some nascent concepts and responses. Is this any good?, she asks. Her supervisor isn’t sure either. Is this relevant? Yeah, but, is it too obvious? The supervisor prods her saying, time is of the essence. Internal hesitation over whether or not to share an idea in class still plagues me even after multiple decades of participation in the exercise. However, the difference is that now, in 18th grade, I can explicitly model that very idea-sharing decision.

A twist on classical utility: Enter ego…

To model this decision, I enter into a belief-based utility world. I define my utility function as follows: u = r – e + g√p, with r being the classroom response to the idea, e being the effort cost of sharing the idea, p being the probability that I think it’s a quality idea, and g being a parameter for “ego utility.” In classical utility world, this g√p term would not exist; I would simply weigh the benefit of sharing the idea r and the cost of sharing the idea e. Moreover, in a departure from classical economics assumptions, this form of belief-based utility displays information aversion, thus the square root on the p.

Now, let’s run through the outcomes based on class participation or class non-participation. If I take the jump and share my idea, I always expend some amount of effort e>0. Meanwhile, the benefit I derive depends on the ex post observable quality of the idea, as measured by the classroom response. If the idea was high quality, I gain r=1. If instead it was lacking or, shall we say, basic, I gain r=b where 1>b>0. If I keep my thoughts to myself, then e=0 and r=0.

In effect, if I share my idea, I receive u = p(1-e+g√1) + (1-p)(b-e+g√0). The first term on the right hand side is my perceived probability that the idea is of high quality multiplied by the associated payoff, while the second term is my perceived probability that the idea is basic multiplied by that associated payoff. Rearranging terms, u = p(1+g) + (1-p)b – e. Meanwhile, if I stay silent, my payoff is simply u = g√p.

Using this simple framework, I will share my idea with the class if and only if p(1+g) + (1-p)b – e > g√p. Simplified, I share my idea if and only if g(p-√p) + p + (1-p)b – e > 0.

Given this inequality, we can see that if g goes to infinity (i.e., my ego utility is huge), and p is not 0 or 1, the inequality will never hold, as p-√p will always be negative (since p is a fraction between 0 and 1); this means I will never raise my hand to share my ideas because I am so paralyzed by my massive ego utility. Meanwhile, as p approaches 0 or 1, the g(p-√p) term goes to 0, leaving the decision up to the inequality p + (1-p)b > e. Thus, I will speak if the expected value of the payoff to my comment exceeds the effort cost. (Recall that this is exactly what I do in the classical utility case in which I have no ego utility.)

While both of these two above conclusions seem predictable, there is a notable intriguing prediction from this model. You might expect that the greater my perceived probability that the idea is high quality, the more likely I am to share the idea. Well, this is not true. In other words, there is non-monotonicity in p. Say I have a moderate level of ego utility g and my p grows from a low to a higher level. This positive change in p could cause me to put my hand down even though now I am more confident in the quality of my idea. Weird! Ego utility allows there to be a negative correlation between my confidence in my idea’s quality and my willingness to share said idea.

Intuitively, as I become more confident in an idea, not only is there is a higher expected benefit to sharing the idea but there is also a higher possible loss of utility due to the ego utility term. The way these two opposing effects spar with one another can lead my hand to go up, down, and up again as my confidence in an idea increases.

Let’s illustrate this surprising concept graphically. We can parametrize the model and make visually explicit how the decision to my raise hand changes with p. Let’s set g=3, b=0.5, e=0.01. Given these values, I will speak my idea if and only if 3(p-√p) + p + (1-p)0.5 – 0.01 > 0. I.e., iff 3.5p – 3√p + 0.49 > 0. As such, I can graph the utility function with the full range of possible p values from 0 to 1 and accordingly color areas depending on whether or not they correspond to sharing an idea. (I share an idea if the utility function yields a value greater than or equal to 0; otherwise, I do not.)

share

The above illustrates that I am willing to share an idea when my probability that it is high quality is very low, but that I am no longer willing once the probability is a more moderately low value. This is evidence of the non-monotonicity in p in this model; I might lower my hand in class to protect my ego.

Anecdotal evidence, dynamics, and blog posting

I find ego utility fascinating and very believable when reflecting on my own experiences. For one, I have noticed that I often become more silent as conversation topics sway from topics in which I am novice to topics that I am moderately more knowledgable about. I feel acutely aware of the aforementioned tensions in the model; yes, I am more confident in my ideas in this realm, but I now have more to lose if I choose to share them. This is also a hesitation I feel internally when I talk with professors and friends about ideas. If the idea is undeveloped, there is really no harm in sharing it (p is low at that point); but, if I have been working on it and have a higher p, now, there is a chance I might realize that my idea was not up to snuff. In this sense, I can sometimes feel myself keeping ideas or projects to myself, as then they can’t be externally revealed to be low quality. I can sit on the sidelines and nurture my pet projects without a care in the world, stroking the ego-related term in my utility function.

But, in a more complex model, perhaps one that better represents my reality, idea quality is improved with idea sharing and collaboration. The model at hand is a one-shot game. I have an idea and I decide whether or not to share it. (The end.) But, in my flesh-and-blood/Stata-and-R universe, ideas do not disappear after that first instance of sharing; they develop dynamically. If I imagine refitting the model to mimic my reality, it is clear that silence for ego appeasement is a strategy that does not pay off long term…

I like to think that this is one reason why I write these posts — to share and accordingly develop ideas. In fact, when I started sharing R code online almost three years ago I was such a novice that I had a very low p regarding my data visualization capacities. In this way, ego utility was not able to hold me back from openly sharing my scripts. I was a strong advocate for transparency (still am) and at that time didn’t mind at all if my code looked like “a house built by a a child using nothing but a hatchet and a picture of a house.” However, if I were to imagine starting blogging now, I could see holding off, as I perceive my probability of being a decent coder as much larger than I did three years ago.

In the end, I am very happy that I chose to start sharing my work when I had a very small p. In fact, if you squint really hard, you can probably see me lounging on the utility function curve, fumbling to use ggplot2, somewhere in that first blue chunk of the graph.

Endnote

This post adapts model mechanics from Koszegi 2006. A Psychology and Economics lecture explicitly inspired and informed this piece. Lastly, here is the R notebook used to create the graphic in this post.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.
Advertisements

The United Nations of Words

Bar Charts

Newsletter e-mails are often artifacts of faded interests or ancient online shopping endeavors. They can be nostalgia-inducing — virtual time capsules set in motion by your past self at t-2, and egged on by your past self at t-1. Remember that free comedy show, that desk lamp purchase (the one that looks Pixar-esque), that political campaign… oof, actually let’s scratch that last one. But, without careful care, newsletters breed like rabbits and mercilessly crowd inboxes. If you wish to escape the onslaught of red notification bubbles, these e-mails are a sworn enemy whose defeat is an ever-elusive ambition.

However, there is a newsletter whose appearance in my inbox I perpetually welcome with giddy curiosity. That is, Jeremy Singer-Vine’s “Data is Plural.” Every week features a new batch of datasets for your consideration. One dataset in particular caught my eye in the 2017.07.19 edition:

UN General Debate speeches. Each September, the United Nations gathers for its annual General Assembly. Among the activities: the General Debate, a series of speeches delivered by the UN’s nearly 200 member states. The statements provide “an invaluable and, largely untapped, source of information on governments’ policy preferences across a wide range of issues over time,” write a trio of researchers who, earlier this year, published the UN General Debate Corpus — a dataset containing the transcripts of 7,701 speeches from 1970 to 2016.

The Corpus explains that these statements are “akin to the annual legislative state-of-the-union addresses in domestic politics.” As such, they provide a valuable resource for understanding international governments’ “perspective[s] on the major issues in world politics.” Now, I have been interested in playing around with text mining in R for a while. So a rich dataset of international speeches seems like a natural application of basic term frequency and sentiment analysis methods. As I am interested in comparing countries to one another, I need to select a subset of the hundreds to study. Given their special status, I focus exclusively on the five UN Security council countries: US, Britain, France, China, and Russia. (Of course, you could include many, many more countries of interest for this sort of investigation, but given the format of my desired visuals, five countries is a good cut-off.) Following in the typed footsteps of great code tutorials, I perform two types of analyses–a term frequency analysis and a sentiment analysis–to discuss the thousands of words that were pieced together to form these countries’ speeches.

Term Frequency Analysis

Term frequency analysis has been used in contexts ranging from studying Seinfeld to studying the field of 2016 GOP candidates. A popular metric for such analyses is tf-idf, which is a score of relative term importance. Applied to my context, the metric reveals words that are frequently used by one country but infrequently used by the other four. In more general terms, “[t]he tf-idf value increases proportionally to the number of times a word appears in the document, but is often offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.” (Thanks, Wikipedia.) In short, tf-idf picks out important words for our countries of interest. The 20 words with the highest tf-idf scores are illustrated below:

tfidftotal

China is responsible for 13 of the 20 words. Perhaps this means that China boasts the most unique vocabulary of the Security Council. (Let me know if you disagree with that interpretation.) Now, if instead we want to see the top 5 words for each country–to learn something about their differing focuses–we obtain the results below:

tfidf_country

As an American, I am not at all surprised by the picture of my country as one of democratic, god-loving, dream-having entrepreneurs who have a lot to say about Saddam Hussein. Other insights to draw from this picture are: China is troubled by Western superpower countries influencing (“imperialist”) or dominating (“hegemonism”) others, Russia’s old status as the USSR involved lots of name checks to leader Leonid Ilyich Brezhnev, and Britain and France like to talk in the third-person.

Sentiment Analysis

In the world of sentiment analysis, I am primarily curious about which countries give the most and least positive speeches. To figure this out, I calculate positivity scores for each country according to the three sentiment dictionaries, as summarized by the UC Business Analytics R Programming Guide:

The nrc lexicon categorizes words in a binary fashion (“yes”/“no”) into categories of positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. The bing lexicon categorizes words in a binary fashion into positive and negative categories. The AFINN lexicon assigns words with a score that runs between -5 and 5, with negative scores indicating negative sentiment and positive scores indicating positive sentiment.

Therefore, for the nrc and bing lexicons, my generated positivity scores will reflect the number of positive words less the number of negative words. Meanwhile, the AFINN lexicon positivity score will reflect the sum total of all scores (as words have positive scores if they possess positive sentiment and negative scores if they possess negative sentiment). Comparing these three positivity scores across the five Security Council countries yields the following graphic:

country_pos

The three methods yield different outcomes: AFINN and Bing conclude that China is the most positive country, followed by the US; meanwhile, the NRC identifies the US as the most positive country, with China in fourth place. And, despite all that disagreement, at least everyone can agree that the UK is the least positive! (How else do we explain “Peep Show”?)

Out of curiosity, I also calculate the NRC lexicon word counts for anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. I then divide the sentiment counts by total numbers of words attributed to each country so as to present the percentage of words with some emotional range rather than the absolute levels for that range. The results are displayed below in stacked and unstacked formats.

feelings1

feelings2

According to this analysis, the US is the most emotional country with over 30% of words associated with a NRC sentiment. China comes in second, followed by the UK, France, and Russia, in that order. However, all five are very close in terms of emotional word percentages so this ordering does not seem to be particularly striking or meaningful. Moreover, the specific range of emotions looks very similar country by country as well. Perhaps this is due to countries following some well-known framework of a General Debate Speech, or perhaps political speeches in general follow some tacit emotional script displaying this mix of emotions…

I wonder how such speeches compare to a novel or a newspaper article in terms of these lexicon scores. For instance, I’d imagine that the we’d observe more evidence of emotion in these speeches than in newspaper articles, as those are meant to be objective and clear (though this is less true of new forms of evolving media… i.e., those that aim to further polarize the public… or, those that were aided by one of the Security Council countries to influence an election in another of the Security Council countries… yikes), while political speeches might pick out words specifically to elicit emotion. It would be fascinating to investigate how emotional words are wielded in political speeches or new forms of journalistic media, and how that has evolved over time. (Quick hypothesis: fear is more present in the words that make up American media coverage and political discourse nowadays than it was a year ago…) But, I will leave that work (for now) to people with more in their linguistics toolkit than a novice knowledge of super fun R packages.

Code

As per my updated workflow, I now conduct projects exclusively using R notebooks! So, here is the R notebook responsible for the creation of the included visuals. And, here is the associated Github repo with everything required to replicate the analysis. Methods mimic those outlined by superhe’R’os Julia Silge and David Robinson in their “Text Mining with R” book.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

You think, therefore I am

Models
Intro

Hello world, I am now a G2 in economics PhD-land. This means I have moved up in the academic hierarchy; [1] I now reign over my very own cube, and [2] I am taking classes in my fields of interest. For me that means: Social Economics, Behavioral Economics, and Market Design. That also means I am coming across a lot of models, concepts, results that I want to tell you (whoever you are!) about. So, please humor me in this quasi-academic-paper-story-time… Today’s topic: Coate and Laury (1993)’s model of self-fulfilling negative stereotypes, a model presented in Social Economics.

Once upon a time…

There was a woman who liked math. She wanted to be a data scientist at a big tech company and finally don the company hoodie uniform she kept seeing on Caltrain. Though she had been a real ace at cryptography and tiling theory in college (this is the ultimate clue that this woman is not based on yours truly), she hadn’t been exposed to any coding during her studies. She was considering taking online courses to learn R or Python, or maybe one of those bootcamps… they also have hoodies, she thought.

She figured that learning to code and building a portfolio of work on Github would be a meaningful signal to potential employers as to her future quality as an employee. But, of course, she knew that there are real, significant costs to investing in developing these skills… Meanwhile, in a land far, far away–in an office ripe with ping pong tables–individuals at a tech company were engaged in decisions of their own: the very hiring decisions that our math-adoring woman was taking into account.

So, did this woman invest in coding skills and become a qualified candidate? Moreover, did she get hired? Well, this is going to take some equations to figure out, but, thankfully, this fictional woman as well as your non-fictitious female author dig that sort of thing.

Model Mechanics of “Self-Fulfilling Negative Stereotypes”

Let’s talk a little about this world that our story takes place in. Well, it’s 1993 and we are transported onto the pages of the American Economic Review. In the beginning Coates and Laury created the workers and the employers. And Coates and Laury said, “Let there be gender,” and there was gender. Each worker is also assigned a cost of investment, c. Given the knowledge of personal investment cost and one’s own gender, the worker makes the binary decision between whether or not to invest in coding skills and thus become qualified for some amorphous tech job. Based on the investment decision, nature endows each worker with an informative signal, s, which employers then can observe. Employers, armed with knowledge of an applicant’s gender and signal, make a yes-no hiring decision.

Of course, applicants want to be hired and employers want to hire qualified applicants. As such, payoffs are as follows: applicants receive w if they are hired and did not invest, w-c if they are hired and invested, and 0 if they are not hired. On the tech company side, a firm receives $q if they hire a qualified worker, -$u if they hire an unqualified worker, and 0 if they choose not to hire.

Note importantly that employers do not observe whether or not an applicant is qualified. They just observe the signals distributed by nature. (The signals are informative and we have the monotone likelihood ratio property… meaning the better the signal the more likely the candidate is qualified and the lower the signal the more likely the candidate isn’t qualified.) Moreover, gender doesn’t enter the signal distribution at all. Nor does it influence the cost of investment that nature distributes. Nor the payoffs to the employer (as would be the case in the Beckerian model of taste-based discrimination). But… it will still manage to come into play!

How does gender come into play then, you ask? In equilibrium! See, in equilibrium, agents seek to maximize expected payoffs. And, expected payoffs depend on the tech company’s prior probability that the worker is qualifiedp. Tech companies then use p and observed signal to update their beliefs via Bayes’ Rule. So, the company now has some posterior probability, B(s,p), that is a function of p and s. The company’s expected payoff is thus B(s,p)($q) – (1-B(s,p))(-$u) since that is the product of the probability of the candidate’s being qualified and the gain from hiring a qualified candidate less the product of the candidate’s being unqualified and the penalty to hiring an unqualified candidate. The tech company will hire a candidate if that bolded difference is greater than or equal to 0. In effect, the company decision is then characterized by a threshold rule such that they accept applicants with signal greater than or equal to s*(p) such that the expected payoff equals 0. Now, note that this s* is a function of p. That’s because if p changes in the equation B(s,p)($q) – (1-B(s,p))(-$u)=0, there’s now a new s that makes it hold with equality. In effect, tech companies hold different genders to different standards in this model. Namely, it turns out that s*(p) is decreasing in p, which means intuitively that the more pessimistic employer beliefs are about a particular group, the harder the standards that group faces.

So, let’s say, fictionally that tech companies thought, hmmm I don’t know, “the distribution of preferences and abilities of men and women differ in part due to biological causes and that these differences may explain why we don’t see equal representation of women in tech and leadership” [Source: a certain memo]. Such a statement about differential abilities yields a lower p for women than for men. In this model. that means women will face higher standards for employment.

Now, what does that mean for our math-smitten woman who wanted to decide whether to learn to code or not? In this model, workers anticipate standards. Applicants know that if they invest, they receive an amount = (probability of being above standard as a qualified applicant)*w +(probability of falling below standard as a qualified applicant)*0 – c. If they don’t invest, they receive = (probability of being above standard as an unqualified applicant)*w +(probability of falling below standard as an unqualified applicant)*0. Workers invest only if the former is greater than or equal to the latter. If the model’s standard is higher for women than men, as the tech company’s prior probability that women are qualified is smaller than it is for men, then the threshold for investing for women will be higher than it is for men. 

So, if in this model-world, that tech company (with all the ping pong balls) is one of a ton of identical tech companies that believe, for some reason or another, that women are less likely to be qualified than men for jobs in the industry, women are then induced to meet a higher standard for hire. That higher standard, in effect, is internalized by women who then don’t invest as much. In the words of the original paper, “In this way, the employers’ initial negative beliefs are confirmed.”

The equilibrium, therefore, induces worker behavior that legitimizes the original beliefs of the tech companies. This is a case of statistical discrimination that is self-fulfilling. It is an equilibrium that is meant to be broken, but it is incredibly tricky to do so. Once workers have been induced to validate employer beliefs, then those beliefs are correct… and, how do you correct them?

I certainly don’t have the answer. But, on my end, I’ll keep studying models and attempting to shift some peoples’ priors…

Screen Shot 2017-09-06 at 10.56.41 PM

Oh, and my fictional female math-enthusiast will be endowed with as many tech hoodies as she desires. In my imagination, she has escaped the world of this model and found a tech company with a more favorable prior. A girl can dream…

Endnote

This post adapts Coate and Laury (1993) to the case of women in tech in order to demonstrate and summarize the model’s dynamics of self-fulfilling negative stereotypes. Discussion and lecture in Social Economics class informed this post. Note that these ideas need not be focused on gender and tech. They are applicable to many other realms, including negative racial group stereotypes and impacts on a range of outcomes, from mortgage decisions to even police brutality.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

 

Senate Votes Visualized

Grid Maps

It has been exactly one week since the Senate voted to start debate on Obamacare. There were three Obamacare repeal proposals that followed in the wake of the original vote. Each one failed, but in a different way. News outlets such as the NYTimes did a great job reporting how each Senator voted for all the proposals. I then used that data to geographically illustrate Senators’ votes for each Obamacare-related vote. See below for a timeline of this past week’s events and accompanying R-generated visuals.

Tuesday, July 25th, 2017

The senate votes to begin debate.

deb_final

This passes 51-50 with Pence casting the tie-breaking vote. The visual shows the number of (R) and (D) Senators in each state as well as how those Senators voted. We can easily identify Collins and Murkowski, the two Republicans who voted NO, by the purple halves of their states (Maine and Alaska, respectively). While Democrats vote as a bloc in this case and in the impending three proposal votes, it is the Republicans who switch between NO and YES over the course of the week of Obamacare votes. Look for the switches between red and purple.

Later that day…

The Senate votes on the Better Care Reconciliation Act.

rr_final

It fails 43-57 at the mercy of Democrats, Collins, Murkowski, and a more conservative bloc of Republicans.

Wednesday, July 26th, 2017

The Senate votes on the Obamacare Repeal and Reconciliation Act.

pr_final

It fails 45-55 at the mercy of Democrats, Collins, Murkowski, and a more moderate bloc of Republicans.

Friday, July 28th, 2017

The Senate votes on the Health Care Freedom Act.

sk_final

It fails 49-51 thanks to Democrats, Collins, Murkowski, and McCain. To hear the gasp behind the slice of purple in AZ, watch the video below.

Code

This was a great exercise in using a few R packages for the first time. Namely, geofacet and magick. The former is used for creating visuals for different geographical regions, and is how the visualization is structured to look like the U.S. The latter allows you to add images onto plots, and is how there’s a little zipper face emoji over DC (as DC has no Senators).

In terms of replication, my R notebook for generating included visuals is here. The github repo is here.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

A rising tide lifts all podcasts

Scatter Plots

A personal history of podcast listening

One afternoon of my junior year, I listened to a chunk of a Radiolab episode about “Sleep” as I myself heavily sank into unconsciousness. It was like guided meditation… supported, in part, by the Alfred P. Sloan Foundation. Jad and Robert’s forays on Radiolab quickly became my new bedtime stories. They helped me transition from days with my nose deep in books and, more accurately, my laptop to dreams that veered away from the geographic markers of one tiny college town in a valley of the Berkshires.

The Radiolab archives were a soundtrack to my last years of college and to my transition from “student” at a college to “staff member” at a university. A few months into my new place in the world, I found myself discussing Sarah Koenig’s Serial with my colleagues in neighboring cubicles. I also wasn’t a stranger to the virtual water cooler of /r/serialpodcast. I became so entrenched in the podcast’s following that I ended up being inspired to start blogging in order to document reddit opinion trends on the topic.

Faced with regular Caltrain rides from the crickets and “beam” store of Palo Alto to the ridiculous-elevation-changes of SF, I started listening to Gilmore Guys. You know, the show about two guys who talk about the Gilmore Girls. I did not think this would take (I mean, there were hundreds of episodes–who would listen to all that?!) but I was very wrong. The two hosts accompanied me throughout two full years of solo moments. Their banter bounced next to me during mornings biking with a smile caked across my face and palm trees to my left and right as well as days marked by fierce impostor syndrome. Their bits floated next to me in the aftermath of medical visits that frightened me and suburban grocery shopping endeavors (which also sometimes frightened me). Their words, light and harmless, sat with me during evenings of drinking beer on that third-of-a-leather-couch I bought on craigslist and silent moments of self-reflection.

That might sound like pretty heavy lifting for a podcast. But, (silly as it might sound) it was my security blanket throughout a few years of shifting priorities and support networks–tectonic plates grumbling under the surface of my loosely structured young adult life.

When it came time to move to Cambridge from Palo Alto, I bought a Leesa mattress thanks to Scott Aukerman’s 4am mattress store advert bit from Comedy Bang Bang. (Sorry, Casper.) Throughout my first doctoral academic year, I regularly listened to Two Dope Queens as I showered and made dinner after frisbee practices. Nowadays, like a good little liberal, I listen to the mix of political yammering, gossip, and calls to arms that makes up Pod Save America.

Podcasts seem to be an increasingly important dimension of our alone time. A mosaic of podcast suggestions is consistently part of entertainment recommendations across friends… which leads me to my question of interest: How are podcasts growing? Are there more created nowadays, or does it just feel like that since we discuss them more? 

Methodological motivation

In following the growth of the R-Ladies organization and the exciting work of associated women, I recently spotted a blog post by R-lady Lucy McGowan. In this post, Lucy looks at the growth of so-called ‘Drunk’ Podcasts. She finds a large growth in that “genre” (if you will) while making great usage of a beer emoji. Moreover, she expresses that:

While it is certainly true that the number of podcasts in general has absolutely increased over this time period, I would be surprised if the increase is as dramatic as the increase in the number of “drunk” podcasts.

I was super skeptical about this statement. I figured the increase in many podcast realms would be just as dramatic, if not more dramatic than that in the ‘drunk’ podcasts universe. So, I took this skepticism as an opportunity to build on Lucy’s code and emoji usage and look into release trends in other podcasting categories. Think of this all as one big excuse to try using emojis in my ggplot creations while talking about podcasts. (Thank you to the author of the emoGG package, a hero who also created Beyoncé color palettes for R.)

Plotting podcasts

I look into podcasting trends in the arenas of ‘sports’, ‘politics’, ‘comedy’ and ‘science.’ I figured these were general umbrella terms that many pods should fall under. However, you can easily adapt the code to look into different genres/search terms if you’re curious about other domains. (See my R notebook for reproducible work on this subject.) I, like Lucy, then choose emojis to use in my eventual scatterplot. Expressing a concept as complex as politics with a single emoji was challenging, but a fun exercise in using my millennial skillset.  (The ‘fist’ emoji was the best I could come up with for ‘politics’ though I couldn’t figure out how to modify the skin tone. I’m open to other suggestions on this front. You can browse through options with unicode here.)

In the end, I combine the plots for all four podcasting categories into one aggregated piece of evidence showing that many podcasts genres have seen dramatic increases in 2017. The growth in number of releases is staggering in all four arenas. (Thus, the title ‘A rising tide lifts all podcasts.’) So, this growth doesn’t seem to be unique to the ‘drunk’ podcast. In fact, these more general/conventional categories see much more substantive increases in releases.

pods

While the above deals with podcast releases, I would be very curious to see trends in podcast listening visualized. For instance, one could use the American Time Use Survey to break down people’s leisure consumption by type during the day. (It seems that the ATUS added “listening to podcast” in 2015.) I’d love to see some animated graphics on entertainment consumption over the hours reminiscent of Nathan Yau’s previous amazing work (“A Day in the Life of Americans”) with ATUS data.

Putting down the headphones

Regardless of the exact nature of the growth in podcasts over the past years, there is no doubt the medium has come to inhabit a unique space. Podcasts feel more steeped in solitude than other forms of entertainment like television or movies, which often are consumed in group settings. Podcasts have helped me re-learn how to be alone (but not without stories, ideas, and my imagination) and enjoy it. And, I am an only-child, so believe me… I used to be quite good at that.

The Little Dataset–despite this focus on podcasts–is brought to you by WordPress and not Squarespace. 🙂

Code

Check out this R Notebook for the code needed to reproduce the graphic. You can also see my relevant github repository.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

The One With All The Quantifiable Friendships, Part 2

Bar Charts, Line Charts, Nightingale Graphs, Stacked Area Charts, Time Series

Since finishing my first year of my PhD, I have been spending some quality time with my computer. Sure, the two of us had been together all throughout the academic year, but we weren’t doing much together besides pdf-viewing and type-setting. Around spring break, when I discovered you can in fact freeze your computer by having too many exams/section notes/textbooks simultaneously open, I promised my MacBook that over the summer we would try some new things together. (And that I would take out her trash more.) After that promise and a new sticker addition, she put away the rainbow wheel.

Cut to a few weeks ago. I had a blast from the past in the form of a Twitter notification. Someone had written a post about using R to analyze the TV show Friends, which was was motivated by a similar interest that drove me to write something about the show using my own dataset back in 2015. In the post, the author, Giora Simchoni, used R to scrape the scripts for all ten seasons of the show and made all that work publicly available (wheeeeee) for all to peruse. In fact, Giora even used some of the data I shared back in 2015 to look into character centrality. (He makes a convincing case using a variety of data sources that Rachel is the most central friend of the six main characters.) In reading about his project, I could practically hear my laptop humming to remind me of its freshly updated R software and my recent tinkering with R notebooks. (Get ready for new levels of reproducibility!) So, off my Mac and I went, equipped with a new workflow, to explore new data about a familiar TV universe.

Who’s Doing The Talking?

Given line by line data on all ten seasons, I, like Giora, first wanted to look at line totals for all characters. In aggregating all non-“friends” characters together, we get the following snapshot:

total

First off, why yes, I am using the official Friends font. Second, I am impressed by how close the totals are for all characters though hardly surprised that Phoebe has the least lines. Rachel wouldn’t be surprised either…

Rachel: Ugh, it was just a matter of time before someone had to leave the group. I just always assumed Phoebe would be the one to go.

Phoebe: Ehh!!

Rachel: Honey, come on! You live far away! You’re not related. You lift right out.

With these aggregates in hand, I then was curious: how would line allocations look across time? So, for each episode, I calculate the percentage of lines that each character speaks, and present the results with the following three visuals (again, all non-friends go into the “other” category):

lines1lines2lines3

Tell me that first graph doesn’t look like a callback to Rachel’s English Trifle. Anyway, regardless of a possible trifle-like appearance, all the visuals illustrate dynamics of an ensemble cast; while there is noise in the time series, the show consistently provides each character with a role to play. However, the last visual does highlight some standouts in the collection of episodes that uncharacteristically highlight or ignore certain characters. In other words, there are episodes in which one member of the cast receives an unusually high or low percentage of the lines in the episode. The three episodes that boast the highest percentages for a single member of the gang are: “The One with Christmas in Tulsa” (41.9% Chandler), “The One With Joey’s Interview” (40.3% Joey), “The One Where Chandler Crosses a Line” (36.3% Chandler). Similarly, the three with the lowest percentages for one of the six are: “The One With The Ring” (1.5% Monica) , “The One With The Cuffs” (1.6% Ross), and “The One With The Sonogram At The End” (3.3% Joey). The sagging red lines of the last visual identify episodes that have a low percentage of lines spoken by a character outside of the friend group. In effect, those dips in the graph point to extremely six-person-centric episodes, such as “The One On The Last Night” (0.4% non-friends dialogue–a single line in this case), “The One Where Chandler Gets Caught” (1.1% non-friends dialogue), and “The One With The Vows” (1.2% non-friends dialogue).

The Men Vs. The Women

Given this title, here’s a quick necessary clip:

Now, how do the line allocations look when broken down by gender lines across the main six characters? Well, the split consistently bounces around 50-50 over the course of the 10 seasons. Again, as was the case across the six main characters, the balanced split of lines is pretty impressive.

gender1gender2

Note that the second visual highlights that there are a few episodes that are irregularly man-heavy. The top three are: “The One Where Chandler Crosses A Line” (77.0% guys), “The One With Joey’s Interview” (75.1% guys), and “The One With Mac and C.H.E.E.S.E.” (70.2% guys). There are also exactly two episodes that feature a perfect 50-50 split for lines across gender: “The One Where Rachel Finds Out” and “The One With The Thanksgiving Flashbacks.”

Say My Name

How much do the main six characters address or mention one another? Giora addressed this question in his post, and I build off of his work by including nicknames in the calculations, and using a different genre of visualization. With respect to the nicknames–“Mon”, “Rach”, “Pheebs”, and “Joe”–“Pheebs” is undoubtably the stickiest of the group. Characters say “Pheebs” 370 times, which has a comfortable cushion over the second-place nickname “Mon” (used 73 times). Characters also significantly differ in their usage of each others’ nicknames. For example, while Joey calls Phoebe “Pheebs” 38.3% of the time, Monica calls her by this nickname only 4.6% of the time. (If you’re curious about more numbers on the nicknames, check out the project notebook.)

Now, after adding in the nicknames, who says whose name? The following graphic addresses that point of curiosity:

mentions

The answer is clear: Rachel says Ross’s name the most! (789 times! OK, we get it, Rachel, you’re in love.) We can also see that Joey is the most self-referential with 242 usages of his own name–perhaps not a shock considering his profession in the entertainment biz. Overall, the above visual provides some data-driven evidence of the closeness between certain characters that is clearly evident in watching the show. Namely, the Joey-Chandler, Monica-Chandler, Ross-Rachel relationships that were evident in my original aggregation of shared plot lines are still at the forefront!

Meta-data

Comparing the above work to what I had originally put together in January 2015 is a real trip. My original graphics back in 2015 were made entirely in Excel and were as such completely unreproducible, as was the data collection process. The difference between the opaqueness of that process and the transparency of sharing notebook output is super exciting to me… and to my loyal MacBook. Yes, yes, I’ll give you another sticker soon.

Let’s see the code!

Here is the html rendered R Notebook for this project. Here is the Github repo with the markdown file included.

*Screen fades to black* 
Executive Producer: Alex Albright

© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

 

A Bellman Equation About Nothing

Line Charts, Models

Cold Open [Introduction]

A few years ago I came across a short paper that I desperately wanted to understand. The magnificent title was “An Option Value Problem from Seinfeld” and the author, Professor Avinash Dixit (of Dixit-Stiglitz model fame), therein discussed methods of solving for “sponge-worthiness.” I don’t think I need to explain why I was immediately drawn to an academic article that focuses on Elaine Benes, but for those of you who didn’t learn about the realities of birth control from this episode of 1990’s television, allow me to briefly explain the relevant Seinfeld-ism. The character Elaine Benes[1] loyally uses the Today sponge as her preferred form of contraception. However, one day it is taken off the market and, after trekking all over Manhattan, our heroine manages to find only one case of 60 sponges to purchase. The finite supply of sponges poses a daunting question to Elaine… namely, when should she choose to use a sponge? Ie, when is a given potential partner sponge-worthy?

JERRY: I thought you said it was imminent.

ELAINE: Yeah, it was, but then I just couldn’t decide if he was really sponge-worthy.

JERRY: “Sponge-worthy”?

ELAINE: Yeah, Jerry, I have to conserve these sponges.

JERRY: But you like this guy, isn’t that what the sponges are for?

ELAINE: Yes, yes – before they went off the market. But I mean, now I’ve got to re-evaluate my whole screening process. I can’t afford to waste any of ’em.

–“The Sponge” [Seinfeld Season 7 Episode 9]

As an undergraduate reading Professor Dixit’s introduction, I felt supremely excited that an academic article was going to delve into the decision-making processes of one of my favorite fictional characters. However, the last sentence in the introduction gave me pause: “Stochastic dynamic programming methods must be used.” Dynamic programming? Suffice it to say that I did not grasp the methodological context or mathematical machinery embedded in the short and sweet paper. After a few read-throughs, I filed wispy memories of the paper away in some cluttered corner of my mind… Maybe one day this will make more sense to me… 

Flash forward to August 2016. Professor David Laibson, the economics department chair, explains to us fresh-faced G1’s (first-year PhD’s) that he will be teaching us the first part of the macroeconomics sequence… Dynamic Programming. After a few days of talking about Bellman equations, I started to feel as if I had seen related work in some past life. Without all the eeriness of a Westworld-esque robot, I finally remembered the specifics of Professor Dixit’s paper and decided to revisit it with Professor Laibson’s lectures in mind. Accordingly, my goal here is to explain the simplified model set-up of the aforementioned paper and illustrate how basics from dynamic programming can be used in “solving for spongeworthiness.”

Act One [The Model]

Dynamic programming refers to taking a complex optimization problem and splitting it up into simpler recursive sub-problems. Consider Elaine’s decision as to when to use a sponge. We can model this as an optimal stopping problem–ie, when should Elaine use the sponge and thus give up the option value of holding it into the future? The answer lies in the solution to a mathematical object called the Bellman equation, which will represent Elaine’s expected present value of her utility recursively.

Using a simplified version of the framework from Dixit (2011), we can explain the intuition behind setting up and solving a Bellman equation. First, let’s lay out the modeling framework. For the sake of computational simplicity, assume Elaine managed to acquire only one sponge rather than the case of 60 (Dixit assumes she has a general m sponges in his set-up, so his computations are more complex than mine). With that one glorious sponge in her back pocket, Elaine goes about her life meeting potential partners, and yada yada yadaTo make the yada yada’s explicit, we say Elaine lives infinitely and meets one new potential partner every day t who is of some quality Qt. Elaine is not living a regular continuous-time life, instead she gets one romantic option each time period. This sets up the problem in discrete-time since Elaine’s decisions are day-by-day rather than infinitesimally-small-moment-by-infinitesimally-small-moment. If we want to base this assumption somewhat in reality, we could think of Elaine as using Coffee Meets Bagel, a dating app that yields one match per day. Ie, one “bagel” each day.

Dixit interprets an individual’s quality as the utility Elaine receives from sleeping with said person. Now, in reality, Elaine would only be able to make some uncertain prediction of a person’s quality based on potentially noisy signals. The corresponding certainty equivalent [the true quality metric] would be realized after Elaine slept with the person. In other words, there would be a distinction between ex post and ex ante quality assessments—you could even think of a potential partner as an experience good in this sense. (Sorry to objectify you, Scott Patterson.) But, to simplify our discussion, we assume that true quality is observable to Elaine—she knows exactly how much utility she will gain if she chooses to sleep with the potential partner of the day. In defense of that assumption, she does vet potential partners pretty thoroughly.

Dixit also assumes quality is drawn from a uniform distribution over [0,1] and that Elaine discounts the future exponentially by a factor of δ in the interval (0,1). Discounting is a necessary tool for agent optimization problems since preferences are time dependent. Consider the following set-up for illustrative purposes: Say Elaine gains X utils from eating a box of jujyfruit fruit today, then using our previously defined discount factor, she would gain δX from eating the box tomorrow, δ2X from eating it the day after tomorrow, and so on. In general, she gains δnX utils from consuming it n days into the future—thus the terminology “exponential discounting.” Given the domain for δ, we know unambiguously that X > δX >δ2X >… and on. That is, if the box of candy doesn’t change between periods (it is always X), (assuming it yields positive utility—which clearly it must given questionable related life decisions.) Elaine will prefer to consume it in the current time period. Ie, why wait if there is no gain from waiting? On the other hand, if Elaine wants to drink a bottle of wine today that yields Y utils, but the wine improves by a factor of w>1 each day, then whether she prefers to drink it today or tomorrow depends on whether Y—the present utility gain of the current state of the wine—or δ(wY)—the discounted utility gain of the aged (improved) wine—is greater. (Ie, if δw>1, she’ll wait for tomorrow.) If Elaine also considers up until n days into the future, she will be comparing, Y,  δ(wY), δ2X(w2Y), …, and δn(wnY).

In our set-up Elaine receives some quality offer each day that is neither static (as in the jujyfruit fruit example) nor deterministically growing (as in the wine example), rather the quality is drawn from a defined distribution (the uniform distribution on the unit interval—mainly chosen to allow for straightforward computations). While quality is observable in the current period, the future draws are not observable, meaning that Elaine must compare her current draw with an expectation of future draws. In short, everyday Elaine has the choice whether to use the sponge and gain Qt through her utility function, or hold the sponge for a potentially superior partner in the future. In other words, Elaine’s current value function is expressed as a choice between the “flow payoff” Qt and the discounted “continuation value function.” Since she is utility maximizing, she will always choose the higher of these two options. Again, since the continuation value function is uncertain, as future quality draws are from some distribution, we must use the expectation operator in that piece of the maximization problem. Elaine’s value function is thus:

eq1

This is the Bellman equation of lore! It illustrates a recursive relationship between the value functions for different time periods, and formalizes Elaine’s decision as a simple optimal stopping problem.

Act Two [Solving for Sponge-worthiness]

To solve for sponge-worthiness, we need to find the value function that solves the Bellman equation, and derive the associated optimal policy rule. Our optimal policy rule is a function that maps each point in the state space (the space of possible quality draws) to the action space such that Elaine achieves payoff V(Qt) for all feasible quality draws in [0,1]. The distribution of Qt+1 are stationary and independent of Qt, as the draws are perpetually from U[0,1]. (Note to the confounded reader: don’t think of the space of quality draws as akin to some jar of marbles in conventional probability puzzles—those in which the draw of a red marble means there are less red to draw later—since our distribution does not shift between periods. For more on other possible distributions, see Act Four.) Due to the aforementioned stationarity and independence, the value of holding onto the sponge [δEV(Qt+1)] is constant for all days. By this logic, if a potential partner of quality Q’ is sponge-worthy, then Q’ ≥ δEV(Qt+1)! Note that for all Q”>Q’, Q”>δEV(Qt+1), so some partner of quality Q” must also be considered sponge-worthy. Similarly, if a person of quality Q’ is not sponge-worthy, then δEV(Qt+1) ≥ Q’ and for all Q”<Q’, Q”<δEV(Qt+1), so any partner of quality Q” must also not be sponge-worthy. Thus, the functional form of the value function is:

eq2

In other words, our solution will be a threshold rule where the optimal policy is to use the sponge if Q> Q* and hold onto the sponge otherwise. The free parameter we need to solve for is Q*, which we can conceptualize as the all-powerful quality level that separates the sponge-worthy from the not!

Act Three [What is Q*?]

When Q= Q*, Elaine should be indifferent between using the sponge and holding onto it. This means that the two arguments in the maximization should be equal–that is, the flow payoff [Q*] and the discounted continuation value function [δEV(Qt+1)]. We can thus set Q*=δEV(Qt+1and exploit the fact that we defined Q ~ U[0,1], to make the following calculations:

eqs3

The positive root yields a Q* >1, which would mean that Elaine never uses the sponge. This cannot be the optimal policy, so we eliminate this root. In effect, we end up with the following solution for Q*:

eq4

Given this Q*, it is optimal to use the sponge if Q> Q*, and it is optimal to hold the sponge Q* ≥ Qt. Thus, as is required by the definition of optimal policy, for all values of Qt:

eq5

We can interpret the way the Q* threshold changes with the discount factor δ using basic economic intuition. As δ approaches 1 (Elaine approaches peak patience), Q* then approaches 1, meaning Elaine will accept no partner but the one of best possible quality. At the other extreme, as δ approaches 0 (Elaine approaches peak impatience), Q* then approaches 0, meaning Elaine will immediately use the sponge with the first potential partner she meets.

To make this visually explicit, let’s use a graph to illustrate Elaine’s value function for some set δ. Take δ=0.8, then Q*=0.5, a clear-cut solution for the sponge-worthiness threshold. Given these numbers, the relationship between the value function and quality level can be drawn out as such:

valfn

What better application is there for the pgfplots package in LaTeX?!

The first diagram illustrates the two pieces that make up Elaine’s value function, while the second then uses the black line to denote the value function, as the value function takes on the maximum value across the space of quality draws. Whether the value function conforms to the red or green line hinges on whether we are in the sponge-worthy range or not. As explained earlier, before the sponge-worthiness threshold, the option value of holding the sponge is the constant such that Q*=δEV(Qt+1). After hitting the magical point of sponge-worthiness, the value function moves one-for-one with Qt. Note that alternative choices for the discount rate would yield different Q*’s, which would shift the red line up or down depending on the value, which in turn impact the leftmost piece of the value function in the second graph. These illustrations are very similar to diagrams we drew in Professor Laibson’s module, but with some more advanced technical graph labelings than what we were exposed to in class (ie, “no sponge for you” and “sponge-worthy”). 

Act Four [Extensions]

In our set-up, the dependence of the value function is simple since there is one sponge and Elaine is infinitely lived. However, it could be that we solve for a value function with more complex time and resource dependence. This could yield a more realistic solution that takes into account Elaine’s age and mortality and the 60 sponges in the valuable case of contraception. We could even perform the sponge-worthiness calculations for Elaine’s monotonically increasing string of sponge quantity requests: 3, 10, 20, 25, 60! (These numbers based in the Seinfeld canon clearly should have been in the tabular calculations performed by Dixit.)

For computational purposes, we also assumed that quality is drawn independently each period (day) from a uniform distribution on the unit interval. (Recall that a uniform distribution over some interval means that each value in the interval has equal probability.) We could alternatively consider a normal distribution, which would likely do a better job of approximating the population quality in reality. Moreover, the quality of partners could be drawn from a distribution whose bounds deterministically grow over time, as there could be an underlying trend upward in the quality of people Elaine is meeting. Perhaps Coffee Meets Bagel gets better at matching Elaine with bagels, as it learns about her preferences.

Alternatively, we could try and legitimize a more specific choice of a distribution using proper Seinfeld canon. In particular, Season 7 Episode 11 (“The Wink,” which is just 2 episodes after “The Sponge”) makes explicit that Elaine believes about 25% of the population is good looking. If we assume Elaine gains utility only from sleeping with good looking people, we could defend using a distribution such that 75% of quality draws are exactly 0 and the remaining 25% of draws are from a normal distribution ranging from 0 to 1.  (Note that Jerry, on the other hand, believes 95% of the population is undateable, so quality draws for Jerry would display an even more extreme distribution–95% of draws would be 0 and the remaining 5% could come from a normal distribution from 0 to 1.)

Regardless of the specific distribution or time/resource constraint choices, the key take-away here is the undeniably natural formulation of this episode’s plot line as an optimal stopping problem. Throughout the course of our six weeks with Professor Laibson, our class used dynamic programming to approach questions of growth, search, consumption, and asset pricing… while these applications are diverse and wide-ranging, don’t methods seem even more powerful when analyzing fictional romantic encounters!?

elaine

Speaking of power

References

As explained earlier, this write-up is primarily focused on the aforementioned Dixit (2011) paper, but also draws on materials from Harvard’s Economics 2010D sequence. In particular, “Economics 2010c: Lecture 1 Introduction to Dynamic Programming” by David Laibson (9/1/2016) & “ECON 2010c Section 1” by Argyris Tsiaras (9/2/2016).


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.