The Curious Case of The Illinois Trump Delegates

Scatter Plots
Intro

This past Wednesday, after watching Hillary Clinton slow-motion strut into the Broad City universe and realizing that this election has successfully seeped into even the most intimate of personal rituals, I planned to go to sleep without thinking any more about the current presidential race. However, somewhere in between Ilana’s final “yas queen” and hitting the pillow, I saw David Wasserman’s FiveThirtyEight article “Trump Voters’ Aversion To Foreign-Sounding Names Cost Him Delegates.”

Like many readers I was immediately drawn to the piece’s fundamentally ironic implication that Trump could have lost delegates in Illinois due to the very racial resentment that he espouses and even encourages among his supporters. The possibility that this could be more deeply investigated was an energizing idea, which had already inspired Evan Soltas to do just that as well as make public his rich-in-applications-and-possibilities dataset. With this dataset in hand, I tried my hand at complementing the ideas from the Wasserman and Soltas articles by building some visual evidence. (Suffice it to say I did not end up going to sleep for a while.)

To contribute to the meaningful work that the two articles have completed, I will first quickly outline their scope and conclusions, and then present the visuals I’ve built using Soltas’ publicly available data. Consider this a politically timely exercise in speedy R scripting!

Wasserman’s FiveThirtyEight piece & Soltas’ blog post

In the original article of interest, Wasserman discusses the noteworthy Illinois Republican primary. He explains that,

Illinois Republicans hold a convoluted “loophole” primary: The statewide primary winner earns 15 delegates, but the state’s other 54 delegates are elected directly on the ballot, with three at stake in each of the state’s 18 congressional districts. Each campaign files slates of relatively unknown supporters to run for delegate slots, and each would-be delegate’s presidential preference is listed beside his or her name.

Given that the delegates are “relatively unknown,” one would assume that delegates in the same district who list the same presidential preference would earn similar numbers of votes. However, surprisingly, Wasserman found that this did not seem to be the case for Trump delegates. In fact, there is a striking pattern in the Illinois districts with the 12 highest vote differentials: “[i]n all 12 cases, the highest vote-getting candidate had a common, Anglo-sounding name” while “a majority of the trailing candidates had first or last names most commonly associated with Asian, Hispanic or African-American heritages.” These findings, while admittedly informal, strongly suggest that Trump supporters are racially biased in their delegate voting behaviors.

Soltas jumps into this discussion by first creating dataset on all 458 people who ran for Illinois Republican delegate spots. He merges data on the individuals’ names, districts, and candidate representation with a variable that could be described as a measure of perceived whiteness–the non-Hispanic white percentage of the individual’s last name, as determined from 2000 US Census data. The inclusion of this variable is what makes the dataset so exciting (!!!) since, as Soltas explains, this gives us an “objective measure to test the phenomenon Wasserman discovered.”

The article goes on to confirm the legitimacy of Wasserman’s hypothesis. In short, “Trump delegates won significantly more votes when they had “whiter” last names relative to other delegates in their district” and this type of effect does not exist for the other Republicans.

Visual evidence time

I now present a few visuals I generated using the aforementioned dataset to see Soltas’ conclusions for myself. First things first, it’s important to note that some grand underlying mechanism does not jump out at you when you simply look at the association between perceived whiteness and vote percentage for all of Trump’s Illinois delegates:

fig1

The above graph does not suggest any significant relationship between these two numbers attached to each individual delegate. This is because the plot shows delegates across all different districts, which will vote for Trump at different levels, but compares their absolute variable levels. What we actually care about is comparing voting percentages within the same district, but across different individuals who all represent the same presidential hopeful. In other words, we need to think about the delegates relative to their district-level context. To do this, I calculate vote percentages and whiteness measures relative to the district: the percentage point difference between a Trump delegate’s vote|whiteness percentage and the average Trump delegate vote|whiteness percentage in that district. (Suggestions welcome on different ways of doing this for visualization’s sake!)

fig2

Now that we are measuring these variables (vote percentage and whiteness measure) relative to the district, there is a statistically significant association beyond even the 0.1% level. (The simple linear regression Y~X in this case yields a t-statistic of 5.4!) In the end, the interpretation of the simplistic linear regression is that a 10 percentage point increase in a Trump delegate’s perceived whiteness relative to the district yields a 0.12 percentage point increase in the delegate’s vote percentage relative to the district. (I’m curious if people think there is a better way to take district levels into account for these visuals–let me know if you have any thoughts that yield a simpler coefficient interpretation!)

The last dimension of this discussion requires comparing Trump to the other Republican candidates. Given the media’s endless coverage of Trump, I would not have been surprised to learn that this effect impacts other campaigns but just was never reported. But, Wasserman and Soltas argue that this is not the case. Their claims are further bolstered by the following visual, which recreates the most recent Trump plot for all 9 candidates who had sufficient data (excludes Gilmore, Huckabee, and  Santorum):

fig3

It should be immediately clear that Trump is the only candidate for whom there is a positive statistically significant association between the two relative measures. While Kasich has an upward sloping regression line, the corresponding 95% confidence interval demonstrates that the coefficient on relative perceived whiteness is not statistically significantly different from 0. Employing the whiteness measure in this context allows us to provide quantitative evidence for Wasserman’s original intuition that this effect is unique to Trump–thus, “lend[ing] credibility to the theory that racial resentment is commonplace among his supporters.”

The role of perceptions of whiteness

Wasserman’s article has incited an outpouring of genuine interest over the past few days. The fascinating nature of the original inquiry combined with Soltas’ integration of a perceived whiteness measure into the Illinois delegate dataset provides a perfect setting in which to investigate the role racial resentment is playing in these particular voting patterns, and in the election on the whole.

Code

My illinois_delegates Github repo has the R script and csv file necessary to replicate all three visuals! (We know data, we have the best data.)


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

 

How I Learned to Stop Worrying and Love Economics

Words, words, words
Intro

Many months ago, in October, the Economics Nobel prize was awarded to Angus Deaton. Beyond experiencing sheer joy at having beaten my friend Mike at predicting the winner, I also was overwhelmed by the routine, yearly backlash against the discipline in the form of articles shared widely across any and all social networks. Of particular interest to me this year was the Guardian’s piece “Don’t let the Nobel prize fool you. Economics is not a science.” The dialogue surrounding this article made me incredibly curious to investigate my own thoughts on the discipline and its place in the realm of “the sciences.” In a frenzy of activity that can only be accurately explained as the result of a perfect storm of manic energy and genuine love for an academic topic, I wrote up a response not only to this article, but also to my own sense of insecurity in studying a discipline that is often cut down to size by the public and other academics.

In my aforementioned frenzy of activity, I found myself constantly talking with Mike (in spite of my status as the superior Nobel forecaster) about the definition of science, hierarchies of methodologies for causal inference, the role of mathematics in applied social science, and our own personal experiences with economics. Eventually, I linked the Guardian article to him in order to explain the source of my academic existential probing. As another economics researcher, Mike had a similarly strong reaction to reading the Guardian’s piece and ended up writing his own response as well.

So, I am now (albeit months after the original discussion) using this space to post both responses. I hope you’ll humor some thoughts and reactions from two aspiring economists.

Alex responds

I developed a few behavioral ticks in college when asked about my major.  First, I would blurt out “Math” and, after a brief pause of letting the unquestioned legitimacy of that discipline settle in, I would add “and Econ!”–an audible exclamation point in my voice. I had discovered through years of experience that the more enthusiastic you sounded, the less likely someone would take a dig at your field. However, nonetheless, I would always brace myself for cutting criticism as though the proofs I attempted to complete in Advanced Microeconomics were themselves the lynchpin of the financial crisis.

In the court of public opinion, economics is often misunderstood as the get-rich-quick major synonymous with Finance. The basic assumptions of self-interest and rationality that the discipline gives its theoretical actors are stamped onto its practitioners and relabeled as hubris and heartlessness. Very few students are seeking out dreamy economics majors to woo them with illustrations of utility functions in which time spent together is a variable accompanied by a large positive coefficient. (The part where you explain that there is also a squared term with a negative coefficient since the law of diminishing marginal utility still applies is not as adorable. Or so I’ve been told.)

It can be hard to take unadulterated pride in a subject that individuals on all sides of the techie/fuzzy or quant/qual spectrum feel confident to discredit so openly. Economics is an outsider to many different categories of academic study; it is notably more focused on quantitative techniques than are other social sciences but its applications are to human phenomena, which rightfully ousts it from the exclusive playground of the hard sciences. I admit I have often felt awkward or personally slighted when accosted by articles like Joris Luyendijk’s “Don’t let the Nobel prize fool you. Economics is not a science.” which readily demeans contributions to economics simply by both appealing to the unsexiness of technical jargon and by contrasting these with the literature and peace prizes:

Think of how frequently the Nobel prize for literature elevates little-known writers or poets to the global stage, or how the peace prize stirs up a vital global conversation: Naguib Mahfouz’s Nobel introduced Arab literature to a mass audience, while last year’s prize for Kailash Satyarthi and Malala Yousafzai put the right of all children to an education on the agenda. Nobel prizes in economics, meanwhile, go to “contributions to methods of analysing economic time series with time-varying volatility” (2003) or the “analysis of trade patterns and location of economic activity” (2008).

While comparing strides in economic methods to the contributions of peace prize recipients is akin to comparing apples to dragon fruit, Luyendijk does have a point that “[m]any economists seem to have come to think of their field in scientific terms: a body of incrementally growing objective knowledge.” When I first starting playing around with regressions in Stata as a sophomore in college, I was working under the implicit assumption that there was one model I was seeking out. My different attempted specifications were the statistical equivalent of an archeologist’s whisks of ancient dust off of some fascinating series of bones. I assumed the skeleton would eventually peek out from the ground, undisputedly there for all to see. I assumed this was just like how there was one theorem I was trying to prove in graph theory–sure, there were multiple modes of axiomatic transport available to end up there, but we were bound to end up in the same place (unless, of course, I fell asleep in snack bar before I could really get there). I quickly realized that directly transplanting mathematical and statistical notions into the realm of social science can lead to numbers and asterisks denoting statistical significance floating around in zero gravity with nothing to pin them down. Tying the 1’s, 3’s, and **’s  down requires theory and we, as economic actors ourselves who perpetually seek optimal solutions, often entertain the fantasy of a perfectly complex and complete model that could smoothly trace the outline and motions of our dynamic, imperfect society.

However, it is exactly Luyendijk’s point that “human knowledge about humans is fundamentally different from human knowledge about the natural world” that precludes this type of exact clean solution to fundamentally human questions in economics–a fact that has and continues to irk me, if not simply because of the limitations of computational social science, then because of the imperfection and incompleteness of human knowledge (even of our own societies, incentives, and desires) of which it reminds me. Yet, as I have spent more and more time steeped in the world of economics, I have come to confidently argue that the lack of one incredibly complex model that manages to encapsulate “timeless truth[s]” about human dynamics does not mean models or quantitative methods have no place in the social sciences. Professor Dani Rodek, in probably my favorite piece of writing on economics this past year, writes that,

Jorge Luis Borges, the Argentine writer, once wrote a short story – a single paragraph – that is perhaps the best guide to the scientific method. In it, he described a distant land where cartography – the science of making maps – was taken to ridiculous extremes. A map of a province was so detailed that it was the size of an entire city. The map of the empire occupied an entire province.

In time, the cartographers became even more ambitious: they drew a map that was an exact, one-to-one replica of the whole empire. As Borges wryly notes, subsequent generations could find no practical use for such an unwieldy map. So the map was left to rot in the desert, along with the science of geography that it represented.

Borges’s point still eludes many social scientists today: understanding requires simplification. The best way to respond to the complexity of social life is not to devise ever-more elaborate models, but to learn how different causal mechanisms work, one at a time, and then figure out which ones are most relevant in a particular setting.

In this sense, “focusing on complex statistical analyses and modeling” does not have to be to “the detriment of the observation of reality,” as Luyendijk states. Instead, emulating the words of Gary King, theoretical reasons for models can serve as guides to our specifications.

In my mind, economics requires not just the capability to understand economic theory and empirics, but also the humility to avoid mapping out the entire universe of possible economic interactions, floating coefficients, and greek numerals. Studying economics requires the humility to admit that economics itself is not an exact science, but also the understanding that this categorization does not lessen the impact of potential breakthroughs, just maybe the egos of researchers like myself.

WHERE IS ECONOMICS?

via xkcd. WHERE IS ECONOMICS?

Mike responds

Economics is an incredibly diverse field, studying topics ranging from how match-fixing works among elite sumo wrestlers to why the gap between developed and developing countries is as large as it is. When considering a topic as broad as whether the field of economics deserves to have a Nobel prize, then, it is important to consider the entire field before casting judgment.

Joris Luyendijk, in his article “Don’t let the Nobel prize fool you. Economics is not a science,” directs most of his criticisms of economics at financial economics specifically instead of addressing the field of economics as a whole. We can even use Mr. Luyendijk’s preferred frame of analysis, Nobel prizes awarded, to see the distinction between finance and economics. Out of the 47 times the economics Nobel has been awarded, it was only given in the field of Financial Economics three times.  And in his article, Mr. Luyendijk only addresses one of these three Nobels. I would argue that since financial economics is but a small part of the entire economics field, even intense criticism of financial economics should not bring the entire economics field down with it.

A closer look at the Nobels awarded in financial economics reveals that the award is not “fostering hubris and leading to disaster” as Mr. Luyendijk claims. The first Nobel awarded in financial economics was presented in 1990, for research on portfolio choice and corporate finance and the creation of the Capital Asset Pricing Model (CAPM). Far from causing financial contagion, to which Mr. Luyendijk hints the economics Nobel prize has contributed, optimal portfolio theory examines how to balance returns and risk, and CAPM provides a foundation for pricing in financial markets. More recently, the 2013 Nobel was again awarded in financial economics, for advances in understanding asset pricing in the short and long term, applications of which include the widely used Case-Shiller Home Price Index.

The second Nobel awarded for financial economics, to Merton and Scholes in 1997, does deserve some criticism, though. However, I would argue that the Black-Scholes asset pricing model gained traction long before the 1997 Nobel Prize, and continues to be used long after the collapse of the hedge fund Merton and Scholes were part of, because of its practical usefulness and not because of any legitimacy the Nobel prize might have endowed it with. The quantification of finance would have happened with or without the Nobel prize, and I find it hard to believe that the existence of the economics Nobel prize causes profit-driven financiers to blindly believe that the Black-Scholes formula is a “timeless truth.”

So if economics is not finance, then what is it? I would argue that an identifying feature of applied economics research is the search for causality. Specifically, much of economics is a search for causality in man-made phenomena. To model human behavior in a tractable way requires making assumptions and simplifications. I have to agree with Mr. Luyendijk that economics needs to be more forthright about those assumptions and limitations – economists may be too eager to take published findings as “timeless truths” without thinking about the inherent limitations of those findings.

Failing to realize the limitations of such findings can come back to bite. For example the Black-Scholes model assumes that securities prices follow a log-normal process, which underestimates the probability of extreme events, such as the ones that led to the collapse of Long-Term Capital Management. But the failure of some to pay attention to well-known limitations of important findings should not diminish economics as a whole.

Applied economics is also distinct from other social sciences in that it attempts to apply the tools of the hard sciences to human problems. I agree with Alex and Mr. Luyendijk that knowledge about the physical and human worlds is inherently different. The heterogeneity of human behavior creates messy models, and these models require the creation of new mathematical and statistical methods to understand them. This “mathematical sophistication” that Mr. Luyendijk bemoans is not just math for math’s sake, it is using tools from the hard sciences to explain real-world phenomena (and what’s wrong with pure math anyways?).

Despite the occasional messy solution, the ideal study in applied economics is still a controlled experiment, as it is in many hard sciences. In the human world, however, this experimental ideal is difficult to implement. Much of applied economics thus relies on quasi-experimental methods, trying to approximate experiments with observational data by finding natural experiments, for example, when controlled experiments are not feasible. Still other branches of economics use actual economic experiments, such as randomized control trials (RCTs). The idea behind economics RCTs is the same as that behind clinical drug trials, where people are randomly separated into treatment and control groups to test the effect of an intervention. RCTs have become increasingly popular, especially in development work, over the past decade or so. Given Mr. Luyendijk’s concern about how divorced from the real world economics has become, he would be impressed by the amount of practical, detailed planning required to successfully implement RCTs, and be taken aback by how different this fieldwork is from the academics spending all day thinking of complex and impractical models that he envisions.

A Nobel prize in economics will probably be awarded for advances in the methodology and applications of RCTs, the closest economics can come to the hard sciences that Mr. Luyendijk so reveres, sometime in the next decade. What will he say then?

Endnote

Mike and I were Research Assistants at Williams College together during summer 2013. Mike is currently on a Fulbright in China working with Stanford’s Rural Education Action Program, which conducts RCTs in rural China. We are both happy to hear any feedback on the linked articles and our responses, as we are both genuinely interested in thinking through where economics (and computational social sciences on the whole) should belong in scientific dialogue.


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.