You think, therefore I am

Models
Intro

Hello world, I am now a G2 in economics PhD-land. This means I have moved up in the academic hierarchy; [1] I now reign over my very own cube, and [2] I am taking classes in my fields of interest. For me that means: Social Economics, Behavioral Economics, and Market Design. That also means I am coming across a lot of models, concepts, results that I want to tell you (whoever you are!) about. So, please humor me in this quasi-academic-paper-story-time… Today’s topic: Coate and Laury (1993)’s model of self-fulfilling negative stereotypes, a model presented in Social Economics.

Once upon a time…

There was a woman who liked math. She wanted to be a data scientist at a big tech company and finally don the company hoodie uniform she kept seeing on Caltrain. Though she had been a real ace at cryptography and tiling theory in college (this is the ultimate clue that this woman is not based on yours truly), she hadn’t been exposed to any coding during her studies. She was considering taking online courses to learn R or Python, or maybe one of those bootcamps… they also have hoodies, she thought.

She figured that learning to code and building a portfolio of work on Github would be a meaningful signal to potential employers as to her future quality as an employee. But, of course, she knew that there are real, significant costs to investing in developing these skills… Meanwhile, in a land far, far away–in an office ripe with ping pong tables–individuals at a tech company were engaged in decisions of their own: the very hiring decisions that our math-adoring woman was taking into account.

So, did this woman invest in coding skills and become a qualified candidate? Moreover, did she get hired? Well, this is going to take some equations to figure out, but, thankfully, this fictional woman as well as your non-fictitious female author dig that sort of thing.

Model Mechanics of “Self-Fulfilling Negative Stereotypes”

Let’s talk a little about this world that our story takes place in. Well, it’s 1993 and we are transported onto the pages of the American Economic Review. In the beginning Coates and Laury created the workers and the employers. And Coates and Laury said, “Let there be gender,” and there was gender. Each worker is also assigned a cost of investment, c. Given the knowledge of personal investment cost and one’s own gender, the worker makes the binary decision between whether or not to invest in coding skills and thus become qualified for some amorphous tech job. Based on the investment decision, nature endows each worker with an informative signal, s, which employers then can observe. Employers, armed with knowledge of an applicant’s gender and signal, make a yes-no hiring decision.

Of course, applicants want to be hired and employers want to hire qualified applicants. As such, payoffs are as follows: applicants receive w if they are hired and did not invest, w-c if they are hired and invested, and 0 if they are not hired. On the tech company side, a firm receives $q if they hire a qualified worker, -$u if they hire an unqualified worker, and 0 if they choose not to hire.

Note importantly that employers do not observe whether or not an applicant is qualified. They just observe the signals distributed by nature. (The signals are informative and we have the monotone likelihood ratio property… meaning the better the signal the more likely the candidate is qualified and the lower the signal the more likely the candidate isn’t qualified.) Moreover, gender doesn’t enter the signal distribution at all. Nor does it influence the cost of investment that nature distributes. Nor the payoffs to the employer (as would be the case in the Beckerian model of taste-based discrimination). But… it will still manage to come into play!

How does gender come into play then, you ask? In equilibrium! See, in equilibrium, agents seek to maximize expected payoffs. And, expected payoffs depend on the tech company’s prior probability that the worker is qualifiedp. Tech companies then use p and observed signal to update their beliefs via Bayes’ Rule. So, the company now has some posterior probability, B(s,p), that is a function of p and s. The company’s expected payoff is thus B(s,p)($q) – (1-B(s,p))(-$u) since that is the product of the probability of the candidate’s being qualified and the gain from hiring a qualified candidate less the product of the candidate’s being unqualified and the penalty to hiring an unqualified candidate. The tech company will hire a candidate if that bolded difference is greater than or equal to 0. In effect, the company decision is then characterized by a threshold rule such that they accept applicants with signal greater than or equal to s*(p) such that the expected payoff equals 0. Now, note that this s* is a function of p. That’s because if p changes in the equation B(s,p)($q) – (1-B(s,p))(-$u)=0, there’s now a new s that makes it hold with equality. In effect, tech companies hold different genders to different standards in this model. Namely, it turns out that s*(p) is decreasing in p, which means intuitively that the more pessimistic employer beliefs are about a particular group, the harder the standards that group faces.

So, let’s say, fictionally that tech companies thought, hmmm I don’t know, “the distribution of preferences and abilities of men and women differ in part due to biological causes and that these differences may explain why we don’t see equal representation of women in tech and leadership” [Source: a certain memo]. Such a statement about differential abilities yields a lower p for women than for men. In this model. that means women will face higher standards for employment.

Now, what does that mean for our math-smitten woman who wanted to decide whether to learn to code or not? In this model, workers anticipate standards. Applicants know that if they invest, they receive an amount = (probability of being above standard as a qualified applicant)*w +(probability of falling below standard as a qualified applicant)*0 – c. If they don’t invest, they receive = (probability of being above standard as an unqualified applicant)*w +(probability of falling below standard as an unqualified applicant)*0. Workers invest only if the former is greater than or equal to the latter. If the model’s standard is higher for women than men, as the tech company’s prior probability that women are qualified is smaller than it is for men, then the threshold for investing for women will be higher than it is for men. 

So, if in this model-world, that tech company (with all the ping pong balls) is one of a ton of identical tech companies that believe, for some reason or another, that women are less likely to be qualified than men for jobs in the industry, women are then induced to meet a higher standard for hire. That higher standard, in effect, is internalized by women who then don’t invest as much. In the words of the original paper, “In this way, the employers’ initial negative beliefs are confirmed.”

The equilibrium, therefore, induces worker behavior that legitimizes the original beliefs of the tech companies. This is a case of statistical discrimination that is self-fulfilling. It is an equilibrium that is meant to be broken, but it is incredibly tricky to do so. Once workers have been induced to validate employer beliefs, then those beliefs are correct… and, how do you correct them?

I certainly don’t have the answer. But, on my end, I’ll keep studying models and attempting to shift some peoples’ priors…

Screen Shot 2017-09-06 at 10.56.41 PM

Oh, and my fictional female math-enthusiast will be endowed with as many tech hoodies as she desires. In my imagination, she has escaped the world of this model and found a tech company with a more favorable prior. A girl can dream…

Endnote

This post adapts Coate and Laury (1993) to the case of women in tech in order to demonstrate and summarize the model’s dynamics of self-fulfilling negative stereotypes. Discussion and lecture in Social Economics class informed this post. Note that these ideas need not be focused on gender and tech. They are applicable to many other realms, including negative racial group stereotypes and impacts on a range of outcomes, from mortgage decisions to even police brutality.


© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

 

Advertisements

Where My Girls At? (In The Sciences)

Line Charts, Scatter Plots
Intro

In the current educational landscape, there is a constant stream of calls to improve female representation in the sciences. However, the call to action is often framed within the aforementioned nebulous realm of “the sciences”—an umbrella term that ignores the distinct environments across the scientific disciplines. To better understand the true state of women in “the sciences,” we must investigate representation at the discipline level in the context of both undergraduate and doctoral education. As it turns out, National Science Foundation (NSF) open data provides the ability to do just that!

The NSF’s Report on Women, Minorities, and Persons with Disabilities in Science and Engineering includes raw numbers on both undergraduate and doctoral degrees earned by women and men across all science disciplines. With these figures in hand, it’s simple to generate measures of female representation within each field of study—that is, percentages of female degree earners. This NSF report spans the decade 2002–­2012 and provides an immense amount of raw material to investigate.[1]

The static picture: 2012

First, we will zero in on the most recent year of data, 2012, and explicitly compare female representation within and across disciplines.[2]

fig1

The NSF groups science disciplines with similar focus (for example, atmospheric and ocean sciences both focus on environmental science) into classified parent categories. In order to observe not only the variation within each parent category but also across the more granular disciplines themselves, the above graph plots percentage female representation by discipline, with each discipline colored with respect to its NSF classified parent category.

The variation within each parent category can be quite pronounced. In the earth, atmospheric, and ocean sciences, female undergraduate representation ranges from 36% (atmospheric sciences) to 47% (ocean sciences) of total graduates. Among PhD graduates, female representation ranges from 39% (atmospheric sciences) to 48% (ocean sciences). Meanwhile, female representation in the physical sciences has an undergraduate range from 19% (physics) to 47% (chemistry) and a PhD range from 20% (physics) to 39% (chemistry). However, social sciences has the largest spread of all with undergraduate female representation ranging from 30% (economics) to 71% (anthropology) and PhD representation ranging from 33% (economics) to 64% (anthropology).

In line with conventional wisdom, computer sciences and physics are overwhelmingly male (undergraduate and PhD female representation lingers around 20% for both). Other disciplines in which female representation notably lags include: economics, mathematics and statistics, astronomy, and atmospheric sciences. Possible explanations behind the low representation in such disciplines have been debated at length.

Interactions between “innate abilities,” mathematical content, and female representation

Relatively recently, in January 2015, an article in Science “hypothesize[d] that, across the academic spectrum, women are underrepresented in fields whose practitioners believe that raw, innate talent is the main requirement for success, because women are stereotyped as not possessing such talent.” While this explanation was compelling to many, another group of researchers quickly responded by showing that once measures of mathematical content were added into the proposed models, the measures of innate beliefs (based on surveys of faculty members) shed all their statistical significance. Thus, the latter researchers provided evidence that female representation across disciplines is instead associated with the discipline’s mathematical content “and that faculty beliefs about innate ability were irrelevant.”

However, this conclusion does not imply that stereotypical beliefs are unimportant to female representation in scientific disciplines—in fact, the same researchers argue that beliefs of teachers and parents of younger children can play a large role in silently herding women out of math-heavy fields by “becom[ing] part of the self-fulfilling belief systems of the children themselves from a very early age.” Thus, the conclusion only objects to the alleged discovery of a robust causal relationship between one type of belief, university/college faculty beliefs about innate ability, and female representation.

Despite differences, both assessments demonstrate a correlation between measures of innate capabilities and female representation that is most likely driven by (1) women being less likely than men to study math-intensive disciplines and (2) those in math-intensive fields being more likely to describe their capacities as innate.[3]

The second point should hardly be surprising to anyone who has been exposed to mathematical genius tropes—think of all those handsome janitors who write up proofs on chalkboards whose talents are rarely learned. The second point is also incredibly consistent with the assumptions that underlie “the cult of genius” described by Professor Jordan Ellenberg in How Not to Be Wrong: The Power of Mathematical Thinking (p.412):

The genius cult tells students it’s not worth doing mathematics unless you’re the best at mathematics, because those special few are the only ones whose contributions matter. We don’t treat any other subject that way! I’ve never heard a student say, “I like Hamlet, but I don’t really belong in AP English—that kid who sits in the front row knows all the plays, and he started reading Shakespeare when he was nine!”

In short, subjects that are highly mathematical are seen as more driven by innate abilities than are others. In fact, describing someone as a hard worker in mathematical fields is often seen as an implicit insult—an implication I very much understand as someone who has been regularly (usually affectionately) teased as a “try-hard” by many male peers.

The dynamic picture: 2002–2012

Math-intensive subjects are predominately male in the static picture for the year 2012, but how has the gender balance changed over recent years (in these and all science disciplines)? To answer this question, we turn to a dynamic view of female representation over a recent decade by looking at NSF data for the entirety of 2002–2012.

fig2

The above graph plots the percentages of female degree earners in each science discipline for both the undergraduate and doctoral levels for each year from 2002 to 2012. The trends are remarkably varied with overall changes in undergraduate female representation ranging from a decrease of 33.9% (computer sciences) to an increase of 24.4% (atmospheric sciences). Overall changes in doctoral representation ranged from a decline of 8.8% (linguistics) to a rise of 67.6% (astronomy). The following visual more concisely summarizes the overall percentage changes for the decade.

fig3

As this graph illustrates, there were many gains in female representation at the doctoral level between 2002 and 2012. All but three disciplines experienced increased female representation—seems promising, yes? However, substantial losses at the undergraduate level should yield some concern. Only six of the eighteen science disciplines experienced undergraduate gains in female representation over the decade.

The illustrated increases in representation at the doctoral level are likely extensions of gains at the undergraduate level from the previous years—gains that are now being eroded given the presented undergraduate trends. The depicted losses at the undergraduate level could very well lead to similar losses at the doctoral level in the coming decade, which would hamper the widely shared goal to tenure more female professors.

The change for computer sciences is especially important since it provides a basis for the vast, well-documented media and academic focus on women in the field. (Planet Money brought the decline in percentage of female computer science majors to the attention of many in 2014.) The discipline experienced a loss in female representation at the undergraduate level that was more than twice the size of that in any other subject, including physics (-15.6%), earth sciences (-12.2%), and economics (-11.9%).

While the previous discussion of innate talent and stereotype threat focused on math-intensive fields, a category computer sciences fall into, I would argue that this recent decade has seen the effect of those forces on a growing realm of code-intensive fields. The use of computer programming and statistical software has become a standard qualification for many topics in physics, statistics, economics, biology, astronomy, and other fields. In fact, completing degrees in these disciplines now virtually requires coding in some way, shape, or form.

For instance, in my experience, one nontrivial hurdle that stands between students and more advanced classes in statistics or economics is the time necessary to understand how to use software such as R and Stata. Even seemingly simple tasks in these two programs requires some basic level of comfort with structuring commands—an understanding that is not taught in these classes, but rather mentioned as a quick and seemingly obvious sidebar. Despite my extensive coursework in economics and mathematics, I am quick to admit that I only became comfortable with Stata via independent learning in a summer research context, and R via pursuing projects for this blog many months after college graduation.

The implications of coding’s expanding role in many strains of scientific research should not be underestimated. If women are not coding, they are not just missing from computer science—they will increasingly be missing from other disciplines which coding has seeped into.

The big picture: present–future

In other words, I would argue academia is currently faced with the issue of improving female representation in code-intensive fields.[4] As is true with math-intensive fields, the stereotypical beliefs of teachers and parents of younger children “become part of the self-fulfilling belief systems of the children themselves from a very early age” that discourage women from even attempting to enter code-intensive fields. These beliefs when combined with Ellenberg’s described “cult of genius” (a mechanism that surrounded mathematics and now also applies to the atmosphere in computer science) are especially dangerous.

Given the small percentage of women in these fields at the undergraduate level, there is limited potential growth in female representation along the academic pipeline—that is, at the doctoral and professorial levels. While coding has opened up new, incredible directions for research in many of the sciences, its evolving importance also can yield gender imbalances due to the same dynamics that underlie underrepresentation in math-intensive fields.

Footnotes

[1] Unfortunately, we cannot extend this year range back before 2002 since earlier numbers were solely presented for broader discipline categories, or parent science categories—economics and anthropology would be grouped under the broader term “social sciences,” while astronomy and chemistry would be included under the term “physical sciences.”

[2] The NSF differentiates between science and engineering as the latter is often described as an application of the former in academia. While engineering displays an enormous gender imbalance in favor of men, I limit my discussion here to disciplines that fall under the NSF’s science category.

[3] The latter viewpoint does have some scientific backing. The paper “Nonlinear Psychometric Thresholds for Physics and Mathematics” supports the notion that while greater work ethic can compensate for lesser ability in many subjects, those below some threshold of mathematical capacities are very unlikely to succeed in mathematics and physics coursework.

[4] On a positive note, atmospheric sciences, which often involves complex climate modeling techniques, has experienced large gains in female representation at the undergraduate level.

Speaking of coding…

Check out my relevant Github repository for all data and R scripts necessary for reproducing these visuals.

Thank you to:

Ally Seidel for all the edits over the past few months! & members of NYC squad for listening to my ideas and debating terminology with me.


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.