Testing for Local Continuity in Racial Animus

Choropleths, Regressions

This past spring I was tasked with writing a final paper for my Comparative Historical Economic Development course. In brainstorming, I started a casual fling with one idea that quickly escalated and led to long spring break dates at the library (collecting data, making maps). In a meeting with my professor back at Harvard, I realized that I had been consumed in the honeymoon phase of an idea. Taking off my rose-colored glasses, it became clear I had to end the tryst so I could focus on more promising paths to a paper…

In the end, I wrote about a different idea. Meanwhile, the original one lived in solitude in an inactive Dropbox folder, a digital dead end. While I never developed it into a paper, why not shape it into a blog post? After all, there were interesting datasets and choropleths to share. So, here I am, throwing it a bone! If only for the sake of intellectual closure…

Persecution Perpetuated } Animus Alive

I was immediately struck by an idea after reading the QJE article “Persecution Perpetuated: The Medieval Origins of Anti-Semitic Violence in Nazi Germany.” Co-authors Voigtländer and Voth use an incredible dataset of about 400 towns “where Jewish communities are documented for both the medieval period and interwar Germany.” They find local continuity in anti-Semitism over 600 years. Local continuity in this context means attacks on Jews were six times more likely in the 1920’s in towns and cities that blamed and then murdered Jews for the Black Death (during 1348-50). Let me repeat myself: local continuity over SIX HUNDRED YEARS. (More than half a millennium.) The paper provides convincing empirical evidence that group-based persecution (anti-Semitism in this case) can meaningfully persist at the local level. History matters, the data screams.

After reading “Persecution Perpetuated,” I was curious if I could merge recently available data sources to test whether racial animus in the US would be display similar local continuity. (Possible alliterative titles to pay homage to “Persecution Perpetuated” include: “Animus Alive” and “Malice Maintained.”) Specifically, what sprang to mind as a possible dependent variable was Seth Stephens-Davidowitz‘s racial animus measure, which is based on based on Google search data. (I’ve always wanted to play around with Seth’s Google data, so I seized on a possible academic opportunity to do so.)

What does it mean to measure racial animus using Google data? Seth proxies for a geographic area’s racial animus by calculating the percent of its Google searches (2004-2007) that included the n-word or its plural. The Google data at its finest geographic level is available for US Designated Market Areas. (There are 210 such DMA’s in the US.) Specifically, the “racially charged search rate” for DMA j = 100 * [n-word Google searches / total Google searches] for j / [n-word Google searches / total Google searches] max over all DMA’s. The necessary underlying assumption is that racial animus makes one more likely to make an n-word Google search. (It does not have to be the case that “every individual using the term harbors racial animus, nor that every individual harboring racial animus will use this term on Google.”) (If you want to know more about this measure, read Seth’s NYTimes piece about his academic work as well.)

Why is using Google search data attractive? Seth argues that survey data is unlikely to paint an accurate picture of racial animus because well, people lie. (Thus the title of his book, Everybody Lies, which I highly recommend if you’re interested in data-driven social science.) Meanwhile, “the conditions under which people [Google] search – online, likely alone, and not participating in an official survey – limit concern of social censoring.” In short, Google search data allows us to access a snapshot of attitudes or beliefs that might otherwise be inaccessible with traditional surveys.

While Google search is a new phenomenon in the grand scheme of history, I was curious if historical data related to racial animus could be a powerful predictor of these modern racially charged search patterns. Ie, I sought out a relevant historical independent variable, which led me to Virginia Commonwealth University (VCU)’s project “Mapping the Second Klu Klux Klan, 1915-1940.” History Professor John Kneebone constructed a list of local KKK chapters (klaverns) using information from a large set of the group’s official publications. (More here.) The project makes visually explicit the widespread nature of the KKK; “Everywhere there was population, there was the Klan,” Kneebone explained. He then worked with digital librarians at VCU Libraries to map out the klaverns and make the raw location data publicly accessible.

With the VCU data in mind, I wondered: does the local historical KKK prevalence in 1915-1940 predict (through perpetuated racial animus) racially charged Google search rates in 2004-2007? There are some issues to note when using klavern location data:

  1. Ideally, I’d use data on the number of members by geographic areas. (Ignoring underlying population figures for a moment: Imagine there are 3 klaverns with 10 members each in DMA A. Meanwhile, there is 1 klavern with 100 members in DMA B. The Klan is more prevalent in terms of raw members in DMA B, but with only the location data in tow, DMA A would seem to have a more prevalent Klan presence.) Unfortunately, I am not aware of data on historical Klan membership by finer levels (DMA-level) of US geography.
  2. The location data are not necessarily complete. Such is the nature of collecting data from a different era. They are based on historical research and investigative work, but klavern locations are likely missing.

So, simply put, the question is: can I find empirical evidence of local continuity in American racial animus by merging historical and modern metrics? Will klavern prevalence per capita (1915-1940) meaningfully predict racially charged Google searches (2000-2004)?

Maps and regressions

First things first. Let’s map both klaverns/million and racially charged search rates by DMA. The variation in klaverns/million turned out to be so skewed that using the raw rate made the variation almost impossible to visually decipher. The use of log(klaverns/million) makes the geographic variation visually interpretable. (See below.)

map_both.png

While I interpreted the two metrics as proxies of racial animus at different times in history, they don’t obviously track through time and space visually. The possible story (that one meaningfully predicts the other) isn’t very convincing at this stage. But, it’s worth seeing the output from a simple regression. (I use log of klaverns/million since the distribution of klaverns/million was skewed, making a nonlinear relationship between modern racially charged search rate and klaverns/million. To counter heteroskedasticity, I transformed klaverns/million to its log. I’m open to criticism/discussion on such log transformations.)

predict.png

While the independent variable is statistically significant (star, star, star), the variation in log(klaverns/million) explains only 3-4% of the variation in racially charged Google search rates. In terms of magnitude, a 1% increase in klaverns/million is associated with a 0.02645 unit increase in racially charged google search rate. That is economically meaningless considering that the rates range from 25-155. This is also without controlling for any of the analogs of V&V covariates that would need to be included to make a convincing case for the robustness of any supposed relationship.

In the end, the VCU location data aggregated up to totals for DMA areas are not a meaningful predictor of racial animus revealed by Google behavior. This could be for many reasons. It could be because klavern location totals are not accurate depictions of KKK prevalence (data reveals locations rather than member totals, data could be incomplete). Moreover, on a more philosophical level, does this project even test the persistence of racial animus?… or is it asking a question about how language squares with historical locations of institutions? I’ll leave that up to my academic counterparts in Political Theory. (CJ, you’re up.)

Why it wasn’t meant to be… a paper

Now, it is worth explaining why this idea was always predestined to stay in blog-land. For one, even an extremely powerful positive result wouldn’t “shift priors” (as economists say to one another) — the result would seem obvious and so, who cares? Voigtländer and Voth’s paper was publishable due to the long-term scope (600 years!) and fine geographic level (towns, cities in Germany) of their results. The concept that attitudes could be perpetuated over time wasn’t the selling point, it was the fact that attitudes could be perpetuated for so long and in such a locally continuous way. If the geographies had been more coarse and their data hadn’t covered such a long period of time, the paper would not have landed in the QJE. Since I asked if 1915-1940 attitudes would predict 2004-2007 attitudes and I was limited to large designated market areas (due to the nature of the google data), my findings were doomed to be unremarkable even if incredibly powerful (which… they were not).

Despite its inevitable end, I did learn a lot from this spring fling. On a qualitative level, I learned about the historical omnipresence of the KKK from an impressive digital humanities project. On a technical level, I learned how how to map DMA areas in R (very tricky). And, on a philosophical level, I learned how to cleanly break up with a project idea. (Thanks for all the optimal stopping lessons, David.)

Data & code
  1. Seth’s racial animus data is here.
  2. VCU data on klaverns is here.
  3. My R notebook for this project is here.
  4. My Github repo for this project is here.

© Alexandra Albright and The Little Dataset That Could, 2018. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

 

Advertisements

Text Me Back: A Year of LDR Communication

Line Charts, Scatter Plots
Motivation

I spent some time in February figuring out how to download my iMessage data. The idea was that I could then use that data to make Jesse an R Notebook Valentine. And it worked!

valentine

A slide from my February R-ladies talk. (Full slides here.)

That Valentine was an investigation into word usage (via tf-idf) and emotional tinges in messages (via sentiment dictionaries). I treated all the messages as two aggregated blocks (one from Jesse, one from me) and was not concerned with the time dimension (when messages were sent). However, I’ve regularly hypothesized to Jesse that I know exactly what a graph of our iMessages would look like over time.

Why’s that? Well, Jesse and I are in a LDR, which means we are accustomed to going weeks without seeing each other. When you live with your SO, you come home to the same place. There isn’t a ton of need to send messages about your day, as you’ll see them soon and can update them later. When you live across the country, you share text update frequently since you might not catch up over phone/video for a few days. Pretty intuitive, right? So, my hypothesis was this: message frequency is ridiculously inversely correlated with being in the same city. (I.e., together, message number low; apart, message number high.)

Visually testing my hypothesis

To test this I wanted to plot the number of daily messages between us over the recent year and then mark time periods whether we were in fact in the same place or not. Turns out, yes, you can perfectly identify when we are in the same city by plotting our daily iMessage frequency!

LDR_year

I am very proud of this visual since it fits my hypothesis perfectly and is a succinct visual story about our virtual communication. Beyond the aesthetics, it also depicts an important part of my life that often is invisible to people. So, I am proud to own both the graph and the reality it depicts.

Relevant to recent concerns about data privacy, Brianna McHorse pointed out that this visual is also an example of how effectively and easily inferences can be made using personal data. I think that’s an incredibly powerful point. Jesse and I obviously didn’t intend to map out our visit schedule with our messaging. But, the picture of our time together/apart is loud and clear nonetheless.

On a final, humbled note, I am very thankful to live in a time and place where I have the technology to keep so significantly in touch with someone on the other side of the country despite our physical distance.

Still, I can’t wait to be back in the green.

Technical notes

If you’d like to download your own iMessage data, it’s very simple! (Though it did take me a while to figure out and piece together a full protocol back in February — thanks to Gulya and Bad Memes et al. for help.) Necessary steps are explained at the start of this R notebook, which also includes all code necessary to make a similar plot. Major props go to Aaron Parecki for building the command machinery used in the data extraction process.

Note to self: It could also be informative to plot daily totals of words (or maybe characters?) sent over iMessage since many messages are emojis, a few words, and reactions to other messages. Ah, to be a millennial.


© Alexandra Albright and The Little Dataset That Could, 2018. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

Social Networks and the Endurance of Economic Injury

Models
Killing two birds with one paper

My last fall semester exam was for Social Economics class. Reading through packets of model summaries, I set out to determine which model — besides the obviously lovable Coate and Loury (1993) — I would most like to understand, remember, and explain. That is, I picked a paper to write about here… which I could also use in my intellectual battle with those pesky blue books.

It was in the middle of the lecture packet on “Intergenerational Mobility” that I found Bowles and Sethi (2006). This paper illustrates how magically ridding the world of discrimination (i.e., saying “assume no discrimination” as an economist) doesn’t necessarily lead to perfect convergence of group economic outcomes. In other words, even with zero discrimination, group differences in economic success can still persist across generations. Why is that? Because social networks matter for economic outcomes, and networks are often segregated by group identity. In the authors’ words,

“Group differences in economic success may persist across generations in the absence of discrimination against the less affluent group because racial segregation of friendship networks, mentoring relationships, neighborhoods, workplaces and schools places the less affluent group at a disadvantage in acquiring the things — contacts, information, cognitive skills, behavioral attributes — that contribute to economic success.”

Social networks are undeniably important in determining individuals’ economic outcomes. As such, building social network structures into models of human capital accumulation improves realism and allows for intriguing intergenerational theoretical results.

Bowles and Sethi (2006) appeals to me because the model formalizes dynamics touched on in many conversations about the long-term impacts of discriminatory practices. In this post I will lay out the model mechanics, explain proofs of the key results, and showcase a graph the authors use to visually illustrate their theoretical results.

Model mechanics

The paper motivates the model with a few words on Brown v. Board and the black-white wage gap. “Many hoped that the demise of legally enforced segregation and discrimination against African Americans during the 1950s and 1960s coupled with the apparent reduction in racial prejudice among whites would provide an environment in which significant social and economic racial disparities would not persist.” Despite initial convergence from the 1950’s to the 1970’s, the gap has persisted. There are many reasons why this could be the case, continued practices of discrimination included. Bowles and Sethi use the following model to illustrate how such gaps could endure even absent discrimination.

In said model, a person is born into one of two groups — black or white — and lives for two periods. In the first period of life, she makes a decision about whether or not to acquire human capital and become ‘skilled.’ This is a simple binary choice. (She either becomes educated/trained or she does nothing.) In the second period, she is paid a wage based on her previous choice. If she didn’t acquire human capital (and thus isn’t skilled), she is paid a wage of 0. If she did (and thus is skilled), she is paid a wage of h. In effect, the marginal benefit of human capital acquisition is h for all agents.

For the sake of simplicity, the model first assumes all people have the same ability. As such, an individual’s cost of human capital acquisition is solely dependent on the level of human capital in that person’s social network. Define network capital, q, as the fraction of agents in the network who chose to acquire human capital and are skilled. The key assumption in the model is that given the cost function c(q), c'(q)<0. In words, the higher the fraction of skilled people in a person’s network, the less costly it is for the person to become skilled. Ie, acquiring training is less costly when your network can connect you with opportunities and provide you with relevant information.

As per usual, agents choose to become skilled if marginal benefit exceeds marginal cost. Assume c(0)>h>c(1) — that is, the cost of becoming skilled when no one in your network is skilled (q=0) is higher than the benefit of becoming skilled (h), but the cost when everyone is skilled (q=1) is lower than the benefit (h). In effect, there exists a unique threshold level q* such that c(q*)=h. The agent’s decision rule is then: for any q>q*, the agent chooses to becomes skilled & for any q<q*, the agent does not. (I’ll ignore indifference throughout.)

While the decision rule is clear, how are social networks (and thus q‘s) formed? We assume the population shares for B (black) and W (white) groups are x and 1-x, respectively. Moreover, agents born into the model in period t+1 have a large number of ties to those born in t. With probability p in [0,1] an associate is from same group (B or W), but with probability (1-p) an associate is randomly picked from the general population of agents (could be either group). As such, the parameter p is the degree of “in-group bias” or segregation. Assume the parameter is the same for both groups. Therefore, the probability that: a black agent’s connection is also black is p+(1-p)x, a white agent’s connection is also white is  p+(1-p)(1-x), a black agent’s connection is white is (1-p)(1-x), and a white agent’s connection is black is (1-p)x.

The network capital in t+1 depends on the mechanical formation of the agent’s group and human capital accumulation decisions made by black and white agents born in time t (represented by sB(t) and sW(t), respectively):

qB(t+1)=[p+(1-p)x] * sB(t) + [(1-p)(1-x)] * sW(t) 

qW(t+1)=[(1-p)x] * sB(t) + [p+(1-p)(1-x)] * sW(t) 

The above equations show that (for both groups) the fraction of connections in an agent’s network (born in t+1) who are skilled is: chance of black associate * fraction of black agents (born in t) who are skilled + chance of white associate * fraction of white agents (born in t) who are skilled. The network capital of people in the two groups is the same only if: p=0 (there is no segregation) or sW(t)=sB(t) (there is no initial group inequality in human capital).

Given the two above equations, we get a “law of motion” for human capital decisions: If qG(t+1)>q*, sG(t+1)=1; If qG(t+1)<q*, sG(t+1)=0 (with G in {B,W}). In words, if network capital is above the necessary threshold level, all agents of that group become skilled. If network capital is below the necessary threshold level, all agents of that group stay unskilled. Note that in this simplified model all agents make the same decisions within racial groups.

From parameter values to group outcomes

How do we get real-world implications from this model? We know black people have been historically economically disadvantaged in the United States. But, how do we integrate this fact into the model’s framework? Well, we can set the initial state of the world to the extreme (sB, sW)=(0,1), meaning all black agents start of as unskilled and all white agents start of as skilled (perhaps due to separate but unequal hospitals/schools/etc). Based on that initial state, I can then see what the future states of the world will be under the previously derived law of motion.

  1. Let’s assume complete integration, p=0. Given (sB, sW)=(0,1), then qW(t+1)=qB(t+1)=1-x, and since cost is only dependent on network, cost is then c(1-x) for both groups. Thus, all black and white agents will make the same decisions and there will be no asymmetric stable steady state.
  2. Now, consider complete segregation, p=1. Given (sB, sW)=(0,1) again, then qB=0 and qW=1. So, cost is c(0) for black agents and c(1) for white agents. Recall c(0)<h<c(1), meaning that there is necessarily an asymmetric stable steady state. (No black agents will become skilled and all white agents will become skilled.)

Given the points above, the authors explain,

“Since there exists an asymmetric stable steady state under complete segregation but none under complete integration, one may conjecture that there is a threshold level of segregation such that persistent group inequality is feasible if and only if the actual segregation level exceeds this threshold.”

Let’s prove this conjecture. (The following is my summary of the appendix proofs for propositions 1 and 2.) First, we find the unique x” (black population share) threshold s.t. c(1-x”)=h.

  1. Consider x'<x”, then c(1-x’)<h because cost decreases in its argument. Given (sB, sW)=(0,1), qB=(1-p)(1-x’) and qW=p+(1-p)(1-x’). So, c(qW) is decreasing in p and less than h at p=0 (since c(1-x’)<h). Moreover, c(qB) is increasing in p and c(qB)=c((1-0)(1-x’))<h when p=0 but c(qB)=c(0)>h when p=1, thus there is a unique p'(x’) such that c(qB)=h. For all p>p'(x’), we have c(qW)<h<c(qB), meaning (sB,sW)=(0,1) is a steady state. But, for all p<p'(x’), we have c(qW)<c(qB)<h, which makes it optimal for both groups of workers to become skilled and so there is a transition to (1,1). Since that then lowers both costs, the condition c(qW)<c(qB)<h continues to hold which makes (1,1) the stable steady state instead of (0,1).
  2. Consider x’>x”, so c(1-x’)>h. By the same logic, c(qB) is increasing in p and greater than h when p=0 since c(qB)=c(1-x’)>h. So c(qB)>h for all pc(qW) is decreasing in p and c(qW)=c((1-0)(1-x’))>h when p=0 but c(qW)=c(1)<h when p=1, thus there is a unique p'(x’) such that c(qW)=h. For all p>p'(x’), we have c(qW)<h<c(qB), meaning (sB,sW)=(0,1) is a steady state. But, for all p<p'(x’), we have h<c(qW)<c(qB), which makes it optimal for both groups of workers to not become skilled and so transition to (0,0). Since that then increases both costs, the condition h<c(qW)<c(qB) continues to hold which makes (0,0) the stable steady state instead of (0,1).

In sum, given the fraction x, there is a threshold level of segregation p* above which (sB,sW)=(0,1) is a steady state (persistent group inequality), but below which the model shifts to a symmetric steady state. Whether the eventual steady state means welfare improving equalization — (sB,sW)=(1,1) — or welfare reducing equalization — (sB,sW)=(0,0) — depends on the fraction x. If the originally skilled group is large enough, all agents will become skilled, otherwise, all agents will become unskilled.

In words, the model shows that group inequality persists if segregation is high enough. If segregation is below the threshold for maintaining inequality, groups inequality disappears, but whether that is through a loss of everyone’s skills or a gain of everyone’s skills depends on the population shares that define the model world. The authors use the following graph to depict these conclusions:

fg1

Note that Bowles and Sethi (2006) use different variables than me for the parameters of interest. Also, the authors normalize the benefits of human capital accumulation to 0.

This figure sharply summarizes the model’s results thus far. It succinctly and clearly shows how two parameters (population share and segregation) determine the eventual state of the world. I usually use graphs to visualize tangible data, but they are just as useful in visualizing concepts or theoretical results, as seen here. (The graph I built depicting when to share an idea à la Koszegi is another example of visualizing how model parameters relate to outcomes.)

Suspiciously slick?

There are a few issues with the model dynamics that you might have noticed reading the above summary. Namely, everyone is the same within racial groups and convergence occurs in a single period. This feels less interesting than a slower convergence differing by other individual characteristics.

Much of the aforementioned simplicity comes from the assumption that ability is the same for all agents (ie, ability is homogenous). However, the model can be tweaked to make ability heterogenous — that is exactly what the authors do later in the paper. As such, the cost of human capital investment then varies with ability as well as network capital. So, cost now depends on something that is specific to the individual (ability) as well as common to the group (racial identity). (Note: the model assumes no group differences in cost function or ability distributions.)

Moreover, the cost function c(a, q) is then decreasing in both ability and network capital level. In words, it is easier to become skilled when exposed to more skilled people (due to networks), and easier to become skilled when endowed with higher natural ability. For any given network capital level q, there is some threshold ability level a'(q) such that those above the cut-off become skilled and those below do not. Similar to the reasoning in the homogenous case, an agent needs c(a, q)<h to become skilled. In effect, the relevant threshold is defined as the a'(q) s.t. c(a, q)=h. (Any ability above that, the person becomes skilled. Any ability below that, the person does not.)

An interesting insight on this topic is that: “individuals belonging to groups exposed to higher levels of human capital will themselves accumulate human capital at lower ability thresholds relative to individuals in groups with initially lower levels of human capital. This difference will be greater when segregation levels are high.” So, in this more complex build of the model, black people have to boast higher ability levels than their white counterparts to make human capital accumulation cost beneficial… all due to the historical disadvantages build into their social networks. And that all came out of a bunch of threshold rules and variables!

Recap: behold the power of models

Bowles and Sethi built this model with an eye towards examples of enduring economic injury. They saw an empirical fact (the persistent black-white wage gap) and then put structure on their intuitive answers to naturally occurring questions: If there was zero discrimination, would the wage gap still endure? How would that work? Through what channels?

At the end of the day, their model hinges on a few items: (1) the inverse relationship between network capital and cost of skill acquisition, and (2) network formation (as influenced by segregation and population share). The first is an assumption based in the reality of human success and failure — it doesn’t always “take a village” but that definitely helps. The second is a distilled sketch of a complex and idiosyncratic process —  network formation depends on two parameters (segregation and population share) and leads to useful, comprehensible insights. The previously showcased graph is especially important for highlighting the potential difficulty of policy decisions — what part of the graph are we in?

Blue book’d

Bowles and Sethi (2006) illustrates how social networks, population demographics, and decision-making interact to determine the endurance of economic injury. The model also illustrates how writing a blog can sometimes help in your academic life — as it turns out, I managed to describe and solve out pieces of this model in my Social Economics blue book exam. Two birds, one paper.


© Alexandra Albright and The Little Dataset That Could, 2018. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

On ego and sharing ideas

Line Charts, Models
A behavioral Nobel means a behavioral blog post

In honor of Richard Thaler’s recent Nobel prize win, I give you a post on a behavioral economics topic! Welcome to round 2 of A G2 Talks Models. Today’s topic: ego utility and the decision to speak up in class à la Koszegi 2006, an approach to belief-based utility adapted from Psychology and Economics lecture.

An admission of ego

I have a confession to make. I want you to think I’m smart. There, I said it. It is important to my self-image that you (yes, you on the other side of this screen!), my academic peers, and even the man (boy?) who tipsily mansplained the Monty Hall problem to me perceive me as intelligent. That is, intelligence as signaled by the occasional insightful comment, deep question, or quality idea that I get up the nerve to share.

Classrooms are environments in which lots of signaling of such smarts takes place. Professors ask questions, both rhetorical and not. They let us marinate in pregnant pauses and make a call for ideas. There’s a beat in which the tiny neuron-bureaucrat who is tasked with managing and organizing my brain activity files through some nascent concepts and responses. Is this any good?, she asks. Her supervisor isn’t sure either. Is this relevant? Yeah, but, is it too obvious? The supervisor prods her saying, time is of the essence. Internal hesitation over whether or not to share an idea in class still plagues me even after multiple decades of participation in the exercise. However, the difference is that now, in 18th grade, I can explicitly model that very idea-sharing decision.

A twist on classical utility: Enter ego…

To model this decision, I enter into a belief-based utility world. I define my utility function as follows: u = r – e + g√p, with r being the classroom response to the idea, e being the effort cost of sharing the idea, p being the probability that I think it’s a quality idea, and g being a parameter for “ego utility.” In classical utility world, this g√p term would not exist; I would simply weigh the benefit of sharing the idea r and the cost of sharing the idea e. Moreover, in a departure from classical economics assumptions, this form of belief-based utility displays information aversion, thus the square root on the p.

Now, let’s run through the outcomes based on class participation or class non-participation. If I take the jump and share my idea, I always expend some amount of effort e>0. Meanwhile, the benefit I derive depends on the ex post observable quality of the idea, as measured by the classroom response. If the idea was high quality, I gain r=1. If instead it was lacking or, shall we say, basic, I gain r=b where 1>b>0. If I keep my thoughts to myself, then e=0 and r=0.

In effect, if I share my idea, I receive u = p(1-e+g√1) + (1-p)(b-e+g√0). The first term on the right hand side is my perceived probability that the idea is of high quality multiplied by the associated payoff, while the second term is my perceived probability that the idea is basic multiplied by that associated payoff. Rearranging terms, u = p(1+g) + (1-p)b – e. Meanwhile, if I stay silent, my payoff is simply u = g√p.

Using this simple framework, I will share my idea with the class if and only if p(1+g) + (1-p)b – e > g√p. Simplified, I share my idea if and only if g(p-√p) + p + (1-p)b – e > 0.

Given this inequality, we can see that if g goes to infinity (i.e., my ego utility is huge), and p is not 0 or 1, the inequality will never hold, as p-√p will always be negative (since p is a fraction between 0 and 1); this means I will never raise my hand to share my ideas because I am so paralyzed by my massive ego utility. Meanwhile, as p approaches 0 or 1, the g(p-√p) term goes to 0, leaving the decision up to the inequality p + (1-p)b > e. Thus, I will speak if the expected value of the payoff to my comment exceeds the effort cost. (Recall that this is exactly what I do in the classical utility case in which I have no ego utility.)

While both of these two above conclusions seem predictable, there is a notable intriguing prediction from this model. You might expect that the greater my perceived probability that the idea is high quality, the more likely I am to share the idea. Well, this is not true. In other words, there is non-monotonicity in p. Say I have a moderate level of ego utility g and my p grows from a low to a higher level. This positive change in p could cause me to put my hand down even though now I am more confident in the quality of my idea. Weird! Ego utility allows there to be a negative correlation between my confidence in my idea’s quality and my willingness to share said idea.

Intuitively, as I become more confident in an idea, not only is there is a higher expected benefit to sharing the idea but there is also a higher possible loss of utility due to the ego utility term. The way these two opposing effects spar with one another can lead my hand to go up, down, and up again as my confidence in an idea increases.

Let’s illustrate this surprising concept graphically. We can parametrize the model and make visually explicit how the decision to my raise hand changes with p. Let’s set g=3, b=0.5, e=0.01. Given these values, I will speak my idea if and only if 3(p-√p) + p + (1-p)0.5 – 0.01 > 0. I.e., iff 3.5p – 3√p + 0.49 > 0. As such, I can graph the utility function with the full range of possible p values from 0 to 1 and accordingly color areas depending on whether or not they correspond to sharing an idea. (I share an idea if the utility function yields a value greater than or equal to 0; otherwise, I do not.)

share

The above illustrates that I am willing to share an idea when my probability that it is high quality is very low, but that I am no longer willing once the probability is a more moderately low value. This is evidence of the non-monotonicity in p in this model; I might lower my hand in class to protect my ego.

Anecdotal evidence, dynamics, and blog posting

I find ego utility fascinating and very believable when reflecting on my own experiences. For one, I have noticed that I often become more silent as conversation topics sway from topics in which I am novice to topics that I am moderately more knowledgable about. I feel acutely aware of the aforementioned tensions in the model; yes, I am more confident in my ideas in this realm, but I now have more to lose if I choose to share them. This is also a hesitation I feel internally when I talk with professors and friends about ideas. If the idea is undeveloped, there is really no harm in sharing it (p is low at that point); but, if I have been working on it and have a higher p, now, there is a chance I might realize that my idea was not up to snuff. In this sense, I can sometimes feel myself keeping ideas or projects to myself, as then they can’t be externally revealed to be low quality. I can sit on the sidelines and nurture my pet projects without a care in the world, stroking the ego-related term in my utility function.

But, in a more complex model, perhaps one that better represents my reality, idea quality is improved with idea sharing and collaboration. The model at hand is a one-shot game. I have an idea and I decide whether or not to share it. (The end.) But, in my flesh-and-blood/Stata-and-R universe, ideas do not disappear after that first instance of sharing; they develop dynamically. If I imagine refitting the model to mimic my reality, it is clear that silence for ego appeasement is a strategy that does not pay off long term…

I like to think that this is one reason why I write these posts — to share and accordingly develop ideas. In fact, when I started sharing R code online almost three years ago I was such a novice that I had a very low p regarding my data visualization capacities. In this way, ego utility was not able to hold me back from openly sharing my scripts. I was a strong advocate for transparency (still am) and at that time didn’t mind at all if my code looked like “a house built by a a child using nothing but a hatchet and a picture of a house.” However, if I were to imagine starting blogging now, I could see holding off, as I perceive my probability of being a decent coder as much larger than I did three years ago.

In the end, I am very happy that I chose to start sharing my work when I had a very small p. In fact, if you squint really hard, you can probably see me lounging on the utility function curve, fumbling to use ggplot2, somewhere in that first blue chunk of the graph.

Endnote

This post adapts model mechanics from Koszegi 2006. A Psychology and Economics lecture explicitly inspired and informed this piece. Lastly, here is the R notebook used to create the graphic in this post.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

The United Nations of Words

Bar Charts

Newsletter e-mails are often artifacts of faded interests or ancient online shopping endeavors. They can be nostalgia-inducing — virtual time capsules set in motion by your past self at t-2, and egged on by your past self at t-1. Remember that free comedy show, that desk lamp purchase (the one that looks Pixar-esque), that political campaign… oof, actually let’s scratch that last one. But, without careful care, newsletters breed like rabbits and mercilessly crowd inboxes. If you wish to escape the onslaught of red notification bubbles, these e-mails are a sworn enemy whose defeat is an ever-elusive ambition.

However, there is a newsletter whose appearance in my inbox I perpetually welcome with giddy curiosity. That is, Jeremy Singer-Vine’s “Data is Plural.” Every week features a new batch of datasets for your consideration. One dataset in particular caught my eye in the 2017.07.19 edition:

UN General Debate speeches. Each September, the United Nations gathers for its annual General Assembly. Among the activities: the General Debate, a series of speeches delivered by the UN’s nearly 200 member states. The statements provide “an invaluable and, largely untapped, source of information on governments’ policy preferences across a wide range of issues over time,” write a trio of researchers who, earlier this year, published the UN General Debate Corpus — a dataset containing the transcripts of 7,701 speeches from 1970 to 2016.

The Corpus explains that these statements are “akin to the annual legislative state-of-the-union addresses in domestic politics.” As such, they provide a valuable resource for understanding international governments’ “perspective[s] on the major issues in world politics.” Now, I have been interested in playing around with text mining in R for a while. So a rich dataset of international speeches seems like a natural application of basic term frequency and sentiment analysis methods. As I am interested in comparing countries to one another, I need to select a subset of the hundreds to study. Given their special status, I focus exclusively on the five UN Security council countries: US, Britain, France, China, and Russia. (Of course, you could include many, many more countries of interest for this sort of investigation, but given the format of my desired visuals, five countries is a good cut-off.) Following in the typed footsteps of great code tutorials, I perform two types of analyses–a term frequency analysis and a sentiment analysis–to discuss the thousands of words that were pieced together to form these countries’ speeches.

Term Frequency Analysis

Term frequency analysis has been used in contexts ranging from studying Seinfeld to studying the field of 2016 GOP candidates. A popular metric for such analyses is tf-idf, which is a score of relative term importance. Applied to my context, the metric reveals words that are frequently used by one country but infrequently used by the other four. In more general terms, “[t]he tf-idf value increases proportionally to the number of times a word appears in the document, but is often offset by the frequency of the word in the corpus, which helps to adjust for the fact that some words appear more frequently in general.” (Thanks, Wikipedia.) In short, tf-idf picks out important words for our countries of interest. The 20 words with the highest tf-idf scores are illustrated below:

tfidftotal

China is responsible for 13 of the 20 words. Perhaps this means that China boasts the most unique vocabulary of the Security Council. (Let me know if you disagree with that interpretation.) Now, if instead we want to see the top 5 words for each country–to learn something about their differing focuses–we obtain the results below:

tfidf_country

As an American, I am not at all surprised by the picture of my country as one of democratic, god-loving, dream-having entrepreneurs who have a lot to say about Saddam Hussein. Other insights to draw from this picture are: China is troubled by Western superpower countries influencing (“imperialist”) or dominating (“hegemonism”) others, Russia’s old status as the USSR involved lots of name checks to leader Leonid Ilyich Brezhnev, and Britain and France like to talk in the third-person.

Sentiment Analysis

In the world of sentiment analysis, I am primarily curious about which countries give the most and least positive speeches. To figure this out, I calculate positivity scores for each country according to the three sentiment dictionaries, as summarized by the UC Business Analytics R Programming Guide:

The nrc lexicon categorizes words in a binary fashion (“yes”/“no”) into categories of positive, negative, anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. The bing lexicon categorizes words in a binary fashion into positive and negative categories. The AFINN lexicon assigns words with a score that runs between -5 and 5, with negative scores indicating negative sentiment and positive scores indicating positive sentiment.

Therefore, for the nrc and bing lexicons, my generated positivity scores will reflect the number of positive words less the number of negative words. Meanwhile, the AFINN lexicon positivity score will reflect the sum total of all scores (as words have positive scores if they possess positive sentiment and negative scores if they possess negative sentiment). Comparing these three positivity scores across the five Security Council countries yields the following graphic:

country_pos

The three methods yield different outcomes: AFINN and Bing conclude that China is the most positive country, followed by the US; meanwhile, the NRC identifies the US as the most positive country, with China in fourth place. And, despite all that disagreement, at least everyone can agree that the UK is the least positive! (How else do we explain “Peep Show”?)

Out of curiosity, I also calculate the NRC lexicon word counts for anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. I then divide the sentiment counts by total numbers of words attributed to each country so as to present the percentage of words with some emotional range rather than the absolute levels for that range. The results are displayed below in stacked and unstacked formats.

feelings1

feelings2

According to this analysis, the US is the most emotional country with over 30% of words associated with a NRC sentiment. China comes in second, followed by the UK, France, and Russia, in that order. However, all five are very close in terms of emotional word percentages so this ordering does not seem to be particularly striking or meaningful. Moreover, the specific range of emotions looks very similar country by country as well. Perhaps this is due to countries following some well-known framework of a General Debate Speech, or perhaps political speeches in general follow some tacit emotional script displaying this mix of emotions…

I wonder how such speeches compare to a novel or a newspaper article in terms of these lexicon scores. For instance, I’d imagine that the we’d observe more evidence of emotion in these speeches than in newspaper articles, as those are meant to be objective and clear (though this is less true of new forms of evolving media… i.e., those that aim to further polarize the public… or, those that were aided by one of the Security Council countries to influence an election in another of the Security Council countries… yikes), while political speeches might pick out words specifically to elicit emotion. It would be fascinating to investigate how emotional words are wielded in political speeches or new forms of journalistic media, and how that has evolved over time. (Quick hypothesis: fear is more present in the words that make up American media coverage and political discourse nowadays than it was a year ago…) But, I will leave that work (for now) to people with more in their linguistics toolkit than a novice knowledge of super fun R packages.

Code

As per my updated workflow, I now conduct projects exclusively using R notebooks! So, here is the R notebook responsible for the creation of the included visuals. And, here is the associated Github repo with everything required to replicate the analysis. Methods mimic those outlined by superhe’R’os Julia Silge and David Robinson in their “Text Mining with R” book.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

You think, therefore I am

Models
Intro

Hello world, I am now a G2 in economics PhD-land. This means I have moved up in the academic hierarchy; [1] I now reign over my very own cube, and [2] I am taking classes in my fields of interest. For me that means: Social Economics, Behavioral Economics, and Market Design. That also means I am coming across a lot of models, concepts, results that I want to tell you (whoever you are!) about. So, please humor me in this quasi-academic-paper-story-time… Today’s topic: Coate and Loury (1993)’s model of self-fulfilling negative stereotypes, a model presented in Social Economics.

Once upon a time…

There was a woman who liked math. She wanted to be a data scientist at a big tech company and finally don the company hoodie uniform she kept seeing on Caltrain. Though she had been a real ace at cryptography and tiling theory in college (this is the ultimate clue that this woman is not based on yours truly), she hadn’t been exposed to any coding during her studies. She was considering taking online courses to learn R or Python, or maybe one of those bootcamps… they also have hoodies, she thought.

She figured that learning to code and building a portfolio of work on Github would be a meaningful signal to potential employers as to her future quality as an employee. But, of course, she knew that there are real, significant costs to investing in developing these skills… Meanwhile, in a land far, far away–in an office ripe with ping pong tables–individuals at a tech company were engaged in decisions of their own: the very hiring decisions that our math-adoring woman was taking into account.

So, did this woman invest in coding skills and become a qualified candidate? Moreover, did she get hired? Well, this is going to take some equations to figure out, but, thankfully, this fictional woman as well as your non-fictitious female author dig that sort of thing.

Model Mechanics of “Self-Fulfilling Negative Stereotypes”

Let’s talk a little about this world that our story takes place in. Well, it’s 1993 and we are transported onto the pages of the American Economic Review. In the beginning Coate and Loury created the workers and the employers. And Coate and Loury said, “Let there be gender,” and there was gender. Each worker is also assigned a cost of investment, c. Given the knowledge of personal investment cost and one’s own gender, the worker makes the binary decision between whether or not to invest in coding skills and thus become qualified for some amorphous tech job. Based on the investment decision, nature endows each worker with an informative signal, s, which employers then can observe. Employers, armed with knowledge of an applicant’s gender and signal, make a yes-no hiring decision.

Of course, applicants want to be hired and employers want to hire qualified applicants. As such, payoffs are as follows: applicants receive w if they are hired and did not invest, w-c if they are hired and invested, 0 if they are not hired and did not invest, and –if they are not hired and invested. On the tech company side, a firm receives $q if they hire a qualified worker, -$u if they hire an unqualified worker, and 0 if they choose not to hire.

Note importantly that employers do not observe whether or not an applicant is qualified. They just observe the signals distributed by nature. (The signals are informative and we have the monotone likelihood ratio property… meaning the better the signal the more likely the candidate is qualified and the lower the signal the more likely the candidate isn’t qualified.) Moreover, gender doesn’t enter the signal distribution at all. Nor does it influence the cost of investment that nature distributes. Nor the payoffs to the employer (as would be the case in the Beckerian model of taste-based discrimination). But… it will still manage to come into play!

How does gender come into play then, you ask? In equilibrium! See, in equilibrium, agents seek to maximize expected payoffs. And, expected payoffs depend on the tech company’s prior probability that the worker is qualifiedp. Tech companies then use p and observed signal to update their beliefs via Bayes’ Rule. So, the company now has some posterior probability, B(s,p), that is a function of p and s. The company’s expected payoff is thus B(s,p)($q) – (1-B(s,p))(-$u) since that is the product of the probability of the candidate’s being qualified and the gain from hiring a qualified candidate less the product of the candidate’s being unqualified and the penalty to hiring an unqualified candidate. The tech company will hire a candidate if that bolded difference is greater than or equal to 0. In effect, the company decision is then characterized by a threshold rule such that they accept applicants with signal greater than or equal to s*(p) such that the expected payoff equals 0. Now, note that this s* is a function of p. That’s because if p changes in the equation B(s,p)($q) – (1-B(s,p))(-$u)=0, there’s now a new s that makes it hold with equality. In effect, tech companies hold different genders to different standards in this model. Namely, it turns out that s*(p) is decreasing in p, which means intuitively that the more pessimistic employer beliefs are about a particular group, the harder the standards that group faces.

So, let’s say, fictionally that tech companies thought, hmmm I don’t know, “the distribution of preferences and abilities of men and women differ in part due to biological causes and that these differences may explain why we don’t see equal representation of women in tech and leadership” [Source: a certain memo]. Such a statement about differential abilities yields a lower p for women than for men. In this model. that means women will face higher standards for employment.

Now, what does that mean for our math-smitten woman who wanted to decide whether to learn to code or not? In this model, workers anticipate standards. Applicants know that if they invest, they receive an amount = (probability of being above standard as a qualified applicant)*w +(probability of falling below standard as a qualified applicant)*0 – c. If they don’t invest, they receive = (probability of being above standard as an unqualified applicant)*w +(probability of falling below standard as an unqualified applicant)*0. Workers invest only if the former is greater than or equal to the latter. If the model’s standard is higher for women than men, as the tech company’s prior probability that women are qualified is smaller than it is for men, then the threshold for investing for women will be higher than it is for men. 

So, if in this model-world, that tech company (with all the ping pong balls) is one of a ton of identical tech companies that believe, for some reason or another, that women are less likely to be qualified than men for jobs in the industry, women are then induced to meet a higher standard for hire. That higher standard, in effect, is internalized by women who then don’t invest as much. In the words of the original paper, “In this way, the employers’ initial negative beliefs are confirmed.”

The equilibrium, therefore, induces worker behavior that legitimizes the original beliefs of the tech companies. This is a case of statistical discrimination that is self-fulfilling. It is an equilibrium that is meant to be broken, but it is incredibly tricky to do so. Once workers have been induced to validate employer beliefs, then those beliefs are correct… and, how do you correct them?

I certainly don’t have the answer. But, on my end, I’ll keep studying models and attempting to shift some peoples’ priors…

Screen Shot 2017-09-06 at 10.56.41 PM

Oh, and my fictional female math-enthusiast will be endowed with as many tech hoodies as she desires. In my imagination, she has escaped the world of this model and found a tech company with a more favorable prior. A girl can dream…

Endnote

This post adapts Coate and Loury (1993) to the case of women in tech in order to demonstrate and summarize the model’s dynamics of self-fulfilling negative stereotypes. Discussion and lecture in Social Economics class informed this post. Note that these ideas need not be focused on gender and tech. They are applicable to many other realms, including negative racial group stereotypes and impacts on a range of outcomes, from mortgage decisions to even police brutality.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

 

Senate Votes Visualized

Grid Maps

It has been exactly one week since the Senate voted to start debate on Obamacare. There were three Obamacare repeal proposals that followed in the wake of the original vote. Each one failed, but in a different way. News outlets such as the NYTimes did a great job reporting how each Senator voted for all the proposals. I then used that data to geographically illustrate Senators’ votes for each Obamacare-related vote. See below for a timeline of this past week’s events and accompanying R-generated visuals.

Tuesday, July 25th, 2017

The senate votes to begin debate.

deb_final

This passes 51-50 with Pence casting the tie-breaking vote. The visual shows the number of (R) and (D) Senators in each state as well as how those Senators voted. We can easily identify Collins and Murkowski, the two Republicans who voted NO, by the purple halves of their states (Maine and Alaska, respectively). While Democrats vote as a bloc in this case and in the impending three proposal votes, it is the Republicans who switch between NO and YES over the course of the week of Obamacare votes. Look for the switches between red and purple.

Later that day…

The Senate votes on the Better Care Reconciliation Act.

rr_final

It fails 43-57 at the mercy of Democrats, Collins, Murkowski, and a more conservative bloc of Republicans.

Wednesday, July 26th, 2017

The Senate votes on the Obamacare Repeal and Reconciliation Act.

pr_final

It fails 45-55 at the mercy of Democrats, Collins, Murkowski, and a more moderate bloc of Republicans.

Friday, July 28th, 2017

The Senate votes on the Health Care Freedom Act.

sk_final

It fails 49-51 thanks to Democrats, Collins, Murkowski, and McCain. To hear the gasp behind the slice of purple in AZ, watch the video below.

Code

This was a great exercise in using a few R packages for the first time. Namely, geofacet and magick. The former is used for creating visuals for different geographical regions, and is how the visualization is structured to look like the U.S. The latter allows you to add images onto plots, and is how there’s a little zipper face emoji over DC (as DC has no Senators).

In terms of replication, my R notebook for generating included visuals is here. The github repo is here.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.