How I Learned to Stop Worrying and Love Economics

Intro

Many months ago, in October, the Economics Nobel prize was awarded to Angus Deaton. Beyond experiencing sheer joy at having beaten my friend Mike at predicting the winner, I also was overwhelmed by the routine, yearly backlash against the discipline in the form of articles shared widely across any and all social networks. Of particular interest to me this year was the Guardian’s piece “Don’t let the Nobel prize fool you. Economics is not a science.” The dialogue surrounding this article made me incredibly curious to investigate my own thoughts on the discipline and its place in the realm of “the sciences.” In a frenzy of activity that can only be accurately explained as the result of a perfect storm of manic energy and genuine love for an academic topic, I wrote up a response not only to this article, but also to my own sense of insecurity in studying a discipline that is often cut down to size by the public and other academics.

In my aforementioned frenzy of activity, I found myself constantly talking with Mike (in spite of my status as the superior Nobel forecaster) about the definition of science, hierarchies of methodologies for causal inference, the role of mathematics in applied social science, and our own personal experiences with economics. Eventually, I linked the Guardian article to him in order to explain the source of my academic existential probing. As another economics researcher, Mike had a similarly strong reaction to reading the Guardian’s piece and ended up writing his own response as well.

So, I am now (albeit months after the original discussion) using this space to post both responses. I hope you’ll humor some thoughts and reactions from two aspiring economists.

Alex responds

I developed a few behavioral ticks in college when asked about my major.  First, I would blurt out “Math” and, after a brief pause of letting the unquestioned legitimacy of that discipline settle in, I would add “and Econ!”–an audible exclamation point in my voice. I had discovered through years of experience that the more enthusiastic you sounded, the less likely someone would take a dig at your field. However, nonetheless, I would always brace myself for cutting criticism as though the proofs I attempted to complete in Advanced Microeconomics were themselves the lynchpin of the financial crisis.

In the court of public opinion, economics is often misunderstood as the get-rich-quick major synonymous with Finance. The basic assumptions of self-interest and rationality that the discipline gives its theoretical actors are stamped onto its practitioners and relabeled as hubris and heartlessness. Very few students are seeking out dreamy economics majors to woo them with illustrations of utility functions in which time spent together is a variable accompanied by a large positive coefficient. (The part where you explain that there is also a squared term with a negative coefficient since the law of diminishing marginal utility still applies is not as adorable. Or so I’ve been told.)

It can be hard to take unadulterated pride in a subject that individuals on all sides of the techie/fuzzy or quant/qual spectrum feel confident to discredit so openly. Economics is an outsider to many different categories of academic study; it is notably more focused on quantitative techniques than are other social sciences but its applications are to human phenomena, which rightfully ousts it from the exclusive playground of the hard sciences. I admit I have often felt awkward or personally slighted when accosted by articles like Joris Luyendijk’s “Don’t let the Nobel prize fool you. Economics is not a science.” which readily demeans contributions to economics simply by both appealing to the unsexiness of technical jargon and by contrasting these with the literature and peace prizes:

Think of how frequently the Nobel prize for literature elevates little-known writers or poets to the global stage, or how the peace prize stirs up a vital global conversation: Naguib Mahfouz’s Nobel introduced Arab literature to a mass audience, while last year’s prize for Kailash Satyarthi and Malala Yousafzai put the right of all children to an education on the agenda. Nobel prizes in economics, meanwhile, go to “contributions to methods of analysing economic time series with time-varying volatility” (2003) or the “analysis of trade patterns and location of economic activity” (2008).

While comparing strides in economic methods to the contributions of peace prize recipients is akin to comparing apples to dragon fruit, Luyendijk does have a point that “[m]any economists seem to have come to think of their field in scientific terms: a body of incrementally growing objective knowledge.” When I first starting playing around with regressions in Stata as a sophomore in college, I was working under the implicit assumption that there was one model I was seeking out. My different attempted specifications were the statistical equivalent of an archeologist’s whisks of ancient dust off of some fascinating series of bones. I assumed the skeleton would eventually peek out from the ground, undisputedly there for all to see. I assumed this was just like how there was one theorem I was trying to prove in graph theory–sure, there were multiple modes of axiomatic transport available to end up there, but we were bound to end up in the same place (unless, of course, I fell asleep in snack bar before I could really get there). I quickly realized that directly transplanting mathematical and statistical notions into the realm of social science can lead to numbers and asterisks denoting statistical significance floating around in zero gravity with nothing to pin them down. Tying the 1’s, 3’s, and **’s  down requires theory and we, as economic actors ourselves who perpetually seek optimal solutions, often entertain the fantasy of a perfectly complex and complete model that could smoothly trace the outline and motions of our dynamic, imperfect society.

However, it is exactly Luyendijk’s point that “human knowledge about humans is fundamentally different from human knowledge about the natural world” that precludes this type of exact clean solution to fundamentally human questions in economics–a fact that has and continues to irk me, if not simply because of the limitations of computational social science, then because of the imperfection and incompleteness of human knowledge (even of our own societies, incentives, and desires) of which it reminds me. Yet, as I have spent more and more time steeped in the world of economics, I have come to confidently argue that the lack of one incredibly complex model that manages to encapsulate “timeless truth[s]” about human dynamics does not mean models or quantitative methods have no place in the social sciences. Professor Dani Rodek, in probably my favorite piece of writing on economics this past year, writes that,

Jorge Luis Borges, the Argentine writer, once wrote a short story – a single paragraph – that is perhaps the best guide to the scientific method. In it, he described a distant land where cartography – the science of making maps – was taken to ridiculous extremes. A map of a province was so detailed that it was the size of an entire city. The map of the empire occupied an entire province.

In time, the cartographers became even more ambitious: they drew a map that was an exact, one-to-one replica of the whole empire. As Borges wryly notes, subsequent generations could find no practical use for such an unwieldy map. So the map was left to rot in the desert, along with the science of geography that it represented.

Borges’s point still eludes many social scientists today: understanding requires simplification. The best way to respond to the complexity of social life is not to devise ever-more elaborate models, but to learn how different causal mechanisms work, one at a time, and then figure out which ones are most relevant in a particular setting.

In this sense, “focusing on complex statistical analyses and modeling” does not have to be to “the detriment of the observation of reality,” as Luyendijk states. Instead, emulating the words of Gary King, theoretical reasons for models can serve as guides to our specifications.

In my mind, economics requires not just the capability to understand economic theory and empirics, but also the humility to avoid mapping out the entire universe of possible economic interactions, floating coefficients, and greek numerals. Studying economics requires the humility to admit that economics itself is not an exact science, but also the understanding that this categorization does not lessen the impact of potential breakthroughs, just maybe the egos of researchers like myself.

WHERE IS ECONOMICS?

via xkcd. WHERE IS ECONOMICS?

Mike responds

Economics is an incredibly diverse field, studying topics ranging from how match-fixing works among elite sumo wrestlers to why the gap between developed and developing countries is as large as it is. When considering a topic as broad as whether the field of economics deserves to have a Nobel prize, then, it is important to consider the entire field before casting judgment.

Joris Luyendijk, in his article “Don’t let the Nobel prize fool you. Economics is not a science,” directs most of his criticisms of economics at financial economics specifically instead of addressing the field of economics as a whole. We can even use Mr. Luyendijk’s preferred frame of analysis, Nobel prizes awarded, to see the distinction between finance and economics. Out of the 47 times the economics Nobel has been awarded, it was only given in the field of Financial Economics three times.  And in his article, Mr. Luyendijk only addresses one of these three Nobels. I would argue that since financial economics is but a small part of the entire economics field, even intense criticism of financial economics should not bring the entire economics field down with it.

A closer look at the Nobels awarded in financial economics reveals that the award is not “fostering hubris and leading to disaster” as Mr. Luyendijk claims. The first Nobel awarded in financial economics was presented in 1990, for research on portfolio choice and corporate finance and the creation of the Capital Asset Pricing Model (CAPM). Far from causing financial contagion, to which Mr. Luyendijk hints the economics Nobel prize has contributed, optimal portfolio theory examines how to balance returns and risk, and CAPM provides a foundation for pricing in financial markets. More recently, the 2013 Nobel was again awarded in financial economics, for advances in understanding asset pricing in the short and long term, applications of which include the widely used Case-Shiller Home Price Index.

The second Nobel awarded for financial economics, to Merton and Scholes in 1997, does deserve some criticism, though. However, I would argue that the Black-Scholes asset pricing model gained traction long before the 1997 Nobel Prize, and continues to be used long after the collapse of the hedge fund Merton and Scholes were part of, because of its practical usefulness and not because of any legitimacy the Nobel prize might have endowed it with. The quantification of finance would have happened with or without the Nobel prize, and I find it hard to believe that the existence of the economics Nobel prize causes profit-driven financiers to blindly believe that the Black-Scholes formula is a “timeless truth.”

So if economics is not finance, then what is it? I would argue that an identifying feature of applied economics research is the search for causality. Specifically, much of economics is a search for causality in man-made phenomena. To model human behavior in a tractable way requires making assumptions and simplifications. I have to agree with Mr. Luyendijk that economics needs to be more forthright about those assumptions and limitations – economists may be too eager to take published findings as “timeless truths” without thinking about the inherent limitations of those findings.

Failing to realize the limitations of such findings can come back to bite. For example the Black-Scholes model assumes that securities prices follow a log-normal process, which underestimates the probability of extreme events, such as the ones that led to the collapse of Long-Term Capital Management. But the failure of some to pay attention to well-known limitations of important findings should not diminish economics as a whole.

Applied economics is also distinct from other social sciences in that it attempts to apply the tools of the hard sciences to human problems. I agree with Alex and Mr. Luyendijk that knowledge about the physical and human worlds is inherently different. The heterogeneity of human behavior creates messy models, and these models require the creation of new mathematical and statistical methods to understand them. This “mathematical sophistication” that Mr. Luyendijk bemoans is not just math for math’s sake, it is using tools from the hard sciences to explain real-world phenomena (and what’s wrong with pure math anyways?).

Despite the occasional messy solution, the ideal study in applied economics is still a controlled experiment, as it is in many hard sciences. In the human world, however, this experimental ideal is difficult to implement. Much of applied economics thus relies on quasi-experimental methods, trying to approximate experiments with observational data by finding natural experiments, for example, when controlled experiments are not feasible. Still other branches of economics use actual economic experiments, such as randomized control trials (RCTs). The idea behind economics RCTs is the same as that behind clinical drug trials, where people are randomly separated into treatment and control groups to test the effect of an intervention. RCTs have become increasingly popular, especially in development work, over the past decade or so. Given Mr. Luyendijk’s concern about how divorced from the real world economics has become, he would be impressed by the amount of practical, detailed planning required to successfully implement RCTs, and be taken aback by how different this fieldwork is from the academics spending all day thinking of complex and impractical models that he envisions.

A Nobel prize in economics will probably be awarded for advances in the methodology and applications of RCTs, the closest economics can come to the hard sciences that Mr. Luyendijk so reveres, sometime in the next decade. What will he say then?

Endnote

Mike and I were Research Assistants at Williams College together during summer 2013. Mike is currently on a Fulbright in China working with Stanford’s Rural Education Action Program, which conducts RCTs in rural China. We are both happy to hear any feedback on the linked articles and our responses, as we are both genuinely interested in thinking through where economics (and computational social sciences on the whole) should belong in scientific dialogue.


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

This Post is Brought to You by the National Science Foundation

Intro

I have officially finished applying for my PhD. While the application process included many of the same elements that I had previously encountered as a fresh-faced* 17-year-old (think standardized testing without the #2 pencils and lots more button clicking), I am no longer applying as a (relatively) blank slate–a future liberal arts student who will float and skip between disciplines until being neatly slotted into a major. Instead, we PhD applicants have already zeroed in on a particular area of study–in my case, economics. Consequently, each PhD discipline is unlikely to exhibit the same carefully crafted demographics boasted in the pie charts that plaster undergraduate brochures across the country to provide tangible evidence for optimistic, bolded statements about diversity. In formulating responses to a slew of university-specific prompts about diversity in “the sciences,” I grew curiouser and curiouser about two particular questions: What do demographic compositions look like across various PhD disciplines in the sciences? & Have demographic snapshots changed meaningfully over time?

As I continued working to imbue a sense of [academic] self into pdfs composed of tightly structured Times New Roman 12 point font, I repeatedly found myself at the NSF open data portal, seeking to answer these aforementioned questions. However, I would then remind myself that, despite my organic urge to load rows and columns into R Studio, I should be the responsible adult (who I know I can be) and finish my applications before running out to recess. Now that the last of the fateful buttons have been clicked (and a sizable portion of my disposable income has been devoured by application fees and the testing industrial complex), I’m outside and ready to talk science!**

NSF data and sizes of “the sciences”

In this post, I am focusing on the demographics of science PhD degrees awarded as they pertain to citizenship and race/ethnicity, but not gender. In an ideal world, I would be able to discuss the compositions of PhD fields as broken into race/ethnicity-gender combinations, however, the table that includes these types of combinations for US citizens and permanent residents (Table 7-7) only provides the numbers for the broader categories rather than for the desired discipline-level. For instance, social science numbers are provided for 2002-2012 without specific numbers for economics, anthropology, etc. This approach, therefore, would not allow for an investigation into the main topic of interest, which is the demographic differences between the distinct disciplines–there is too much variety within the larger umbrella categories to discuss the fields’ compositions in this way. Therefore, I limit this discussion to demographics with respect to citizenship and race/ethnicity and, accordingly, use Table 7-4 “Doctoral degrees awarded, by citizenship, field, and race or ethnicity: 2002–12” from the NSF Report on Women, Minorities, and Persons with Disabilities in Science and Engineering*** as my data source.

Before getting into the different PhD science fields and their demographics, it’s worth noting the relative sizes of these disciplines. The following treemap depicts the relative sizes of the sciences as defined by NSF data on doctoral degrees awarded in 2012:

treemap2

The size of each squarified rectangle represents the number of degrees awarded within a given field while the color denotes the field’s parent category, as defined by the NSF. (Note that some studies are, in fact, their own parent categories. This is the case for Biological Sciences, Psychology, Computer Sciences, and Agricultural Sciences.) In the upcoming discussion of demographics, we will first discuss raw numbers of degrees earned and the relevant demographic components but will then pivot towards a discussion of percentages, at which point remembering the differences in size will be particularly helpful in piecing together the information into one cohesive idea of the demographics of “the sciences.”****

A decade of demographic snapshots: PhD’s in the sciences

The NSF data specifies two levels of information about the doctoral degrees awarded. The first level identifies the number of degree recipients who are US citizens or permanent residents as well as the number who are temporary residents. Though “[t]emporary [r]esident includes all ethnic and racial groups,” the former category is further broken down into the following subgroups: American Indian or Alaska Native, Asian or Pacific Islander, Black, Hispanic, Other or unknown, and White. In our first exploration of the data, we specify the raw number of degrees awarded to individuals in the specific ethnic and racial categories for US citizens and permanent residents as well as the number awarded to temporary residents. In particular, we start the investigation with the following series of stacked area charts (using flexible y-axes given the vastly different sizes of the disciplines):

raw_plot

In this context and for all following visualizations, the red denotes temporary residents while all other colors (the shades of blue-green and black) are ethnic and racial subsets of the US citizens and permanent residents. By illustrating the raw numbers, this chart allow us to compare the growth of certain PhD’s as well as seeing the distinct demographic breakdowns. While overall the number of science PhD’s increased by 39% from 2002 to 2012, Astronomy, Computer Science, Atmospheric sciences, and Mathematics and statistics PhD’s clearly outpaced other PhD growth rates with increases of 143%, 125% 84%, and 80%, respectively. Meanwhile, the number of Psychology PhD’s actually decreased from 2002 to 2012  by 8%. While this was the only science PhD to experience a decline over the relevant 10-year period, a number of other disciplines grew at modest rates. For instance, the number of Anthropology, Sociology, and Agricultural Sciences PhD’s experienced increases of 15%, 16%, and 18% between 2002 and 2012, which pale in comparison to the vast increases seen in Astronomy, Computer Science, Atmospheric sciences, and Mathematics and statistics.

While it is tempting to use this chart to delve into the demographics of the different fields of study, the use of raw numbers renders a comprehensive comparison of the relative sizes of groups tricky. For this reason, we shift over to visualizations using percentages to best get into the meat of the discussion–this also eliminates the need for different y-axes. In presenting the percentage demographic breakdowns, I supply three different visualizations: a series of stacked area graphs, a series of nightingale graphs (essentially, polar stacked bar charts), and a series of straightforward line graphs, which despite being the least exciting/novel are unambiguous in their interpretation:

percent_area

perc_nightingale

perc_line

One of my main interests in these graphs is the prominence of temporary residents in various disciplines. In fact, it turns out that Economics is actually quite exceptional in terms of its percentage of temporary residents, which lingers around 60% for the decade at hand and is at 58% for 2012. (In 2012, out of the remaining 42% that are US citizens or permanent residents, 70% are white, 11% are asian or pacific islander, 3% are black, 3% are hispanic, 0% are american indian or alaskan native, and 13% are other or unknown.) Economics stands with Computer science, Mathematics and statistics, and Physics as one of the four subjects in the sciences for which temporary residents made up a higher percentage of the PhD population than white US citizens or permanent residents consistently from 2002 to 2012. Furthermore, Economics is also the science PhD with the lowest percentage of white US citizens and permanent residents–that is, a mere 30%.  In this sense, the field stands out as wildly different in these graphs from its social science friends (or, more accurately, frenemies). On another note, it is also not hard to immediately notice that Psychology, which is not a social science in the NSF’s categorization, is so white that its nightingale graph looks like an eye with an immensely overly dilated pupil (though anthropology is not far behind on the dilated pupil front).

Also readily noticeable is the thickness of the blue hues in the case of Area and ethnic studies–an observation that renders it undeniable that this subject is the science PhD with the highest percentage of non-white US citizens and permanent residents. Following this discipline would be the other social sciences Anthropology, Sociology, and Political science and public administration, as well as the separately categorized Psychology. However, it is worth noting that the ambiguity of the temporary residents’ racial and ethnic attributes leaves much of our understanding of the prominence of various groups unclear.

Another focal point of this investigation pertains to the time dimension of these visuals. When homing in on the temporal aspect of these demographic snapshots, there is a discouraging pattern–a lack of much obvious change. This is especially highlighted by the nightingale graphs since the polar coordinates allow the 2012 percentages to loop back next to the 2002 percentages and, thus, facilitate for a simple start-to-end comparison. In most cases, the two points in time look incredibly similar. Of course, this does not necessarily mean there has been no meaningful change. For instance, there have been declines in the percentage of white US citizens and permanent residents in the subjects Area and ethnic studies, Psychology, Sociology, Anthropology, and Political science and public administration, which have then been offset by increases in other groups of individuals. However, the picture is incredibly stagnant for most of the disciplines, especially the hard sciences and the unusually quantitative social science of economics. In pairing the stagnant nature of these demographic snapshots with consistent calls for greater faculty diversity in the wake of campus protests, it is clear that there is a potential bottleneck since such lagging diversity in PhD disciplines can directly contribute to a lack of diversity at the faculty-level.

Endnote

When the public discusses the demographics and diversity of “the sciences,” 1.5 dozen disciplines are being improperly blended together into generalized statements. To better understand the relevant dynamics, individuals should zero in on the discipline-level rather than refer to larger umbrella categories. As it turns out according to our investigation, the demographic breakdowns of these distinct subjects are as fundamentally different as their academic methodologies–methodologies which can be illustrated by the following joke that I can only assume is based on a true story:

As a psychological experiment, an engineer, a chemist, and a theoretical economist are each locked in separate rooms and told they won’t be released until they paint their entire room. They are each given a can of blue paint which holds about half the paint necessary to paint the room and then left alone. A few hours later the psychologist checks up on the three subjects.

(1) The engineer’s walls are completely bare. The engineer explains that he had worked out that there wasn’t enough paint to cover all the walls so he saw no point in starting.

(2) The chemist’s room is painted in faded, streaky blue. “There wasn’t enough paint, so I diluted it,” she explains.

(3) In the economist’s room, the floor and the ceiling are completely blue, and there’s a full can of paint still sitting on the floor. The experimenter is shocked and asks how the economists managed to paint everything. The economist explains, “Oh, I just painted the rational points.”

And with an unwavering appreciation for that bit, I hope to be one of the ~20-30 (who knows?) % of white US citizens/permanent residents in the economics PhD cohort of 2021.

PS-Happy 2016 everyone!

Footnotes

* I had yet to take a driving test at a DMV. I did this successfully at age 21. But, I will not drive your car.

** The NSF divides subjects up into S&E (science and engineering) and non-S&E categories. In this context, I am only discussing the subjects that fall under the umbrella of science. It would be simple to extend the approach and concept to the provided numbers for engineering.

*** This table explains that the exact source for this information is: National Science Foundation, National Center for Science and Engineering Statistics, special tabulations of U.S. Department of Education, National Center for Education Statistics, Integrated Postsecondary Education Data System, Completions Survey, 2002–12.

**** In particular, the tiny size of the group of History of Science PhD’s allows for much more variability year-to-year in terms of demographics. Only 19-34 degrees were given out on an annual basis from 2002-2012. In this case, size of the program is responsible for the wildly evident changes in demographic composition.

Code

Data and R scripts necessary to replicate visualizations are now up on my github! See the NSF_Demographics repo. Let me know if you have any questions or issues with the R script in particular.

Further directions for work
  • Create gif of treemap using years 2002-2012 to replace the static version for just 2012
    • Or use a slider via some D3 magic
  • Follow-up by comparing the gender compositions
  • Look into the development and change history of the US Office of Management and Budget for racial and ethnic categories
    • Just curious as to the timeline of changes and how categorization changes affect our available data

© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

Which U.S. State Performs Best in the New Yorker Caption Contest?

I wrote about this topic with Bob Mankoff for the New Yorker.

You can read the piece here

It builds off of my previous work on the New Yorker Caption contest. (New visuals, new data, and edits from real editors!) Many, many thanks to Bob for giving me access to troves of fascinating data as well as making great edits and alterations to this piece (including the addition of my new favorite phrase “nattering nabobs”).

Bonus cartoon of surprising relevance given Alaska’s success in terms of caption win rate:

Daily Cartoon for Tuesday, September 1st via The New Yorker

Daily Cartoon for Tuesday, September 1st via The New Yorker. We figure Alaska’s wins and submissions to the contest will decline if it comes to this…

Endnote

Code and raw data for replicating these choropleths are available at my NYer_Choropleths Github repo. Also, thanks to Sarah Levine for using her QGIS knowledge to help me tame maps of the US.


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

EDUANALYTICS 101: An Investigation into the Stanford Education Space using Edusalsa Data

Update [4-18-16]: Thanks to Stuart Rojstaczer for finding an error in my grade distribution histograms. Just fixed them and uploaded the fixed R script as well. Such is the beauty of internet feedback!
Note: This is the first in a series of posts that I am putting together in partnership with Edusalsa, an application based at Stanford that seeks to improve how college students explore and choose their courses. Our goal in these posts is to take advantage of the unique data collected from students’ use of the application in order to learn more about how to model and discuss the accumulation and organization of knowledge within the Stanford community as well as within the larger, global education space. (You can read the post here too.)
Course Syllabus

You are frozen in line. This always happens. You don’t know whether to pick the ENGLISH, PHYSICS with double CS combo that you always order or whether to take a risk and try something new. There are thousands of other options; at least a hundred should fit your strict requirements and picky tastes…Hey, maybe you’d like a side of FRENCH! But now you don’t even know what you should get on it; 258, 130, or 128. You are about to ask which of the three goes best with ENGLISH 90 when you wake up.

You realize you missed lunch… and you need to get out of the library.

Complex choices, those with a large number of options (whether in a deli or via online course registration), often force individuals to make choices haphazardly. In the case of academics, students find themselves unable to bulldoze their way through skimming all available class descriptions, and, accordingly, pick their classes with the help of word of mouth and by simply looking through their regular departments offerings. However, it is undoubtably the case that there are ways to improve matching between students and potential quarterly course combinations.

In order to better understand how one could improve the current course choice mechanism, one must first better understand the Stanford education space as well as the myriad of objects (courses, departments, and grades) and actors (students and Professors) that occupy it. The unique data collected from students’ use of Edusalsa provides an opportunity to do just this. In this post, organized in collaboration with the Edusalsa team, we will use this evolving trove of data to discuss three overarching questions: [1] How can we measure the interest surrounding, or the popularity of, a course/department? (In conjunction with that question, what should we make of enrollment’s place in measuring interest or popularity?) [2] What is the grade distribution at Stanford, on the whole as well as on the aggregate school-level? [3] How do students approach using new tools for course discovery?

[1] How can we measure the interest surrounding, or the popularity of, a course/department?

One of the first areas of interest that can be examined with the help of Edusalsa’s data is Stanford student interest across courses and departments. Simply put, we can use total views on Edusalsa, aggregated both by course and by department, as a proxy for for interest in a course/popularity of a course. [See technical footnote 1 (TF1)] In order to visualize the popularity of a collection of courses and departments, we use a treemap structure to illustrate the relative popularities of two sets of academic objects; (1) all courses that garnered at least 20 views, and (2) all departments that garnered at least 30 views: [TF2]

course_tree

dept_tree

The size of the rectangles within the treemap corresponds to the number of endpoints while the darkness of the color corresponds to the estimated enrollment by quarter for classes and entire departments. We notice that, at the course-level, the distribution of colors throughout the rectangles seems disorganized over the size dimension. In other words, there does not seem to be a strong relationship between enrollment and views at the course level. On the other hand, from a cursory look at the second graph, the department treemap seems to illustrate that courses with larger aggregate enrollments (that is, the sum of all enrollments for all classes in a given department) have more views.

What should we make of enrollment’s place in measuring interest or popularity?

While these particular treemaps are useful for visually comparing the number of views across courses and departments, they do not clarify what, if any, is the nature of the relationship between enrollment and views for these two subsets of all courses and departments. [TF2] Due to the treemaps’ analytic shortcomings, we address the legitimacy of our previous intuitions about the relationship by simply regressing views on enrollment at both the course- and department-level. See below for the relevant plot at the course-level:

course_scatter

The coefficient on enrollment in the simple linear regression model, represented by the blue line in the above plot, while positive, is not statistically significant. We can also see this is the case when considering the width of the light green area above (the 99% confidence interval) and the more narrow gray area (the 95% confidence interval), as both areas comfortably include an alternative version of the blue regression line for which the slope is 0. The enrollment variable’s lack of explanatory power is further bolstered by the fact that, in this simple regression model framework, enrollment variation only accounts for 1.3% of the variation in views.

We now turn to the department-level, which seemed more promising from our original glance at the association between colors and sizes in the relevant treemap:

dept_scatter

In this case, the coefficient on enrollment in this model is statistically significant at the 0.1% level and communicates that, on average, a 1,000 person increase in enrollment for a department is associated with an increase of 65 views on Edusalsa. The strength of the association between enrollment and views is further evidenced by the 95% and 99% confidence intervals. In fact, the explanatory power of the enrollment variable in this context is strong to the point that the model accounts for 53.9% of variation in Edusalsa views. [TF3]

Theory derived from the comparison of course-level and department-level relationships

The difference between the strength of enrollment’s relationship with views at the course and at the department level is clear and notable. I believe that this difference is attributable to the vast heterogeneity in interest across courses, meaning there is extreme variance in terms of how much interest a course garners within a given department. Meanwhile, the difference in interest levels that is so evident across courses disappears at the department-level, once all courses are aggregated. This observation potentially serves as evidence of a current course search model in which students rigidly search within specific departments based on their requirements and fields of study, but then break up their exploration more fluidly at the course-level based on what they’ve heard is good or which classes look the most interesting etc. While the students know what to expect from departments, courses can stand out via catchy names or unique concepts in the description.

More possible metrics, and way more colors…

There are a few other metrics beyond views and enrollment that we might be interested in when trying to assess or proxy for interest surrounding a course or department. In order to compare some of these alternative metrics across various departments we present the below heat map, which serves to relatively compare a set of six metrics across the top 15 departments by enrollment size:

heat

While we have discussed enrollment before, I also include number of courses in the second column as an alternative measurement of the size of the department. Rather than defining size by number of people who take classes in the department, this defines size by the number of courses the department offers. The darker greens of CEE, Education, and Law illustrate that these are the departments parenting the most courses.

Another new metric in the above is the fifth column, a metric for number of viewers, which refers the number of unique individuals who visited a course page within a department. The inclusion of this measurement allows us to avoid certain individuals exerting improperly large influence over our measures. For example, one person who visits Economics course pages thousands of times won’t be able to skew this metric though she could skew the views metric significantly. Note that the columns for number of views and number of viewers are very similar, which indicates that, beyond some individuals in EE, departments had individuals viewing courses at similar frequencies.

The last new concept we introduce in the heat map is the notion of normalizing by enrollment, seen in columns four and six, so as to define metrics that take into account the size of the Stanford population that is already involved with these departments. Normalizing views and viewers in this way makes a large impact. Most notably, CS is no longer the dominant department, and instead shares the stage with other departments like Psychology, MS&E, MEE, etc. This normalized measure could be interpreted to proxy for the interest outside of the core members of the department (eg-majors and planned majors), in which case Psychology is certainly looking interesting to those on the outside looking in.

[2] What is the grade distribution at Stanford, on the whole as well as on the aggregate school-level?

The second topic that we cover in this post pertains to that pesky letter attached to a course–that is, grades. Our obtained data included grade distributions by course. [TF4] We use this data to build the frequency distribution for all grades received at Stanford. The following histogram illustrates that the most commonly received grade during the quarter was an A while the median grade was an A- (red line) and the mean grade was a 3.57 (blue line):

stanford_dist

While this visual is interesting in and of itself since it presents all Stanford course offerings solely by grade outcomes, it would also be meaningful to compare different subsets of the Stanford education space. In particular, we choose to use a similar technique to compare grading distributions across the three schools at Stanford–the School of Humanities & Sciences, the School of Engineering, and the School of Earth, Energy and Environmental Sciences–in order to see whether there is any notable difference across the groups:

school_dist

The histograms for the three schools present incredibly similar distributions–to the extent that at first I thought I mistakenly plotted the same school’s distribution three times. All three have medians of A- and the means are span a narrow range of 0.08; the means are 3.52, 3.60, and 3.58 for the Humanities & Sciences, Engineering, and Earth Sciences schools, respectively. [TF5]

[3] How do students approach using new tools for course discovery?

Since we have discussed views and other metrics both across classes and departments, it is worth mentioning what the Edusalsa metrics look like over individual users. Specifically, we are curious how many times unique users view courses through Edusalsa. In examining this, we are inherently examining the level of “stickiness” of the site and the aggregated view of how users interact with new course tools. In this case, the stickiness level is low, as illustrated below by both (i) a quickly plunging number of unique individuals as the number of course views grows, and (ii) a linear decline of number of unique individuals as the number of course views grows when using a log-log plot. [TF6]

stick

The negative linear relationship between the log transformed variables in the second panel (evidenced by the good fit of the above blue line) is indicative of the negative exponential form of the relationship between number of course views and number of unique individuals. [TF7]  This simply indicates that, as is the case with most new applications, so-called stickiness is low. It will be interesting to see whether this changes given the new addition of the ability to create an account.

School’s out (for summer)

Our key insights in this post lie in the depths of section [1], which discussed

evidence of a current course search model in which students rigidly search within specific departments based on their requirements and fields of study, but then break up their exploration more fluidly at the course-level

With evolving data collection, we will continue to use Edusalsa data in order to learn more about the current course search model as well as the specific Stanford education space. Future steps in this line of work will include analyzing the dynamics between departments and the courses that populate them using network analysis techniques. (There is a slew of possible options on this topic: mapping out connections between departments based on overlap in the text of course descriptions, number of cross-listings, etc.)

There is ample room for tools in the education space to help students search across conventional departments, rather than strictly within them, and understanding the channels that individuals most naturally categorize or conceptualize courses constitutes a large chunk of the work ahead.

Technical footnotes
  1. Edusalsa views by course refers to the number of times an invidual viewed the main page for a course on the site. Technically, this is when the data.url that we record includes the suffix “/course?c=DEPT&NUM” where DEPT is the department abbreviation followed by the number of the course within the department. Views aggregated by department is equivalent to the sum total of all views for courses that are under the umbrella of a given department.
  2. We only illustrate courses with at least 20 views and departments with at least 30 views in order that they will be adequately visible in the static treemap. Ideally, the views would be structured in an interactive hierarchical tree structure in which one starts at the school level (Humanities & Sciences, Engineering, Geosciences) and can venture down to the department level followed by the course level.
  3. Though it might seem as though Computer Science is an outlier in this dataset whose omission could fundamentally alter the power of the simple regression model, it turns out even after omitting CS the coefficient on enrollment remains significant at the 0.1% level while the R^2 remains high as well at approximately 0.446.
  4. The grade distribution data is self-reported by Stanford students over multiple quarters.
  5. While the distributions are very similar aggregated over the school level, I doubt they would be as similar at the smaller, more idiosyncratic department-level. This could be interesting to consider across similar departments, such as ME, EE, CEE, etc. It could also be interesting to try and code all classes at Stanford as “techie” or “fuzzy” a la the quintessential Stanford student split and see whether those two grade frequency distributions are also nearly identical.
  6. We found that ID codes we use to identify individuals can change over people in the long-run. We believe this happens rarely in our dataset, however, it is worth noting nonetheless. Due to this caveat, some calculations could be over- or underestimates of the their true values. For instance, the low stickiness for Edusalsa views could be overestimated as some of the users who are coded as distinct people are the same. Under the same logic, in the heat table, the number of viewers could be an overestimate.
  7. The straight line fit in a log-log plot indicates a monomial relationship form. A monomial is a polynomial with one term–i.e. y=ax^n–appear as straight lines in log-log plots such that n and a correspond to the slope and intercept, respectively.
Code and replication

All datasets and R scripts necessary to recreate these visuals are available at my edusalsa Github repo!


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

The Multidimensional Success of Pixar Films Visualized

Get the popcorn

Last Wednesday, as I watched the letter “I” flattened by a familiar sweaky little lamp, I found myself, for the first time in half a decade, about to enter a new Pixar universe. I hadn’t seen a Pixar film in theaters since Toy Story 3—a movie revolving around a boy’s departure to college that was released the same summer my high school graduate cohort and I were also due to leave behind plush bunnies and Hess trucks in pursuit of profound academic knowledge…and a few beers. Now, five years later, I was watching Inside Out, another movie that felt meaningfully-timed due to its release around my one-year anniversary of college graduation. As time has passed since those four years of accelerated, electric activity, we are all left wondering which memories will inevitably roll down into the dusty abyss of lost moments and which will solidify their spots as core memories, turning within our own mental Kodak Carousels.

This train of thought led me to ponder not only key moments in my own lifetime but also those in the Pixar feature film universe’s almost 20-year existence. Considering all 15 movies Pixar has created and released, are some doomed for the abyss of our collective memory while others are permanent pillars of the Pixar canon? In other words, how do the individual units within this manifold collection of films stack up against one another? Moreover, how can we visualize Pixar’s trajectory over the past two decades?

Pixar and metrics of success

In attempting to illustrate Pixar’s evolution over time, I am inclined to use “success” as a metric of interest. Pixar is considered wildly successful—but how do we define success given its multidimensional nature? Well, for one, success is often substantiated through winning awards. Even Pixar’s first movie, Toy Story, which was released in November 1995, proceeded to receive a Special Achievement Academy Award for being the first feature-length computer-animated film, and this was years before the introduction of the Best Animated Film Academy Award in 2001. In fact, since the latter’s inception, Pixar has won the award for Best Animated Film 7 out of 14 years, despite only releasing films in 11. Other meaningful metrics of success include quality ratings, such as those maintained by Rotten Tomatoes and IMDb, and… of course, money. Thus, in tracing out Pixar’s success, we consider three dimensions of success: award victories (Best Animated Film Academy Award wins), quality ratings (we treat Rotten Tomatoes % Fresh as a measure of critical acclaim and IMDb ratings as a measure of public acclaim), and commercial success (Opening Weekend Gross). (We use opening weekend gross since there is not yet a final box office number for Inside Out.)

A path lined with multidimensional success

In order to map out Pixar’s trajectory, we plot all 15 movies released by Pixar using differing colors and sizes of data points in order to represent all three aforementioned dimensions of success. In this graph, the main focus of interest is the % Fresh Rotten Tomatoes rating, which specifies what percentage of critic reviews’ were positive. (Note: we truncate the y-axis in order to better emphasize the evolution of quality over time.) This metric accurately separates out those regularly cited as subpar Pixar movies: Cars, Cars 2, Brave, and Monsters University. We use locally weighted scatterplot smoothing (“loess”) to fit a curve to the dataset, thus charting the movement of % Fresh over time. The loess curve shows us that Pixar took a dip in critical acclaim between 2010 and 2015–what with the release of Cars 2, Brave, and Monsters University–however, Inside Out’s release has tugged the loess curve back up to pre-2011 levels!

pix1

In this sense, Inside Out marks a return to the Pixar of emotive toys and robots—not to mention the most sob-inducing 4 minutes in all of animated film history. The above plot also illustrates Pixar’s success at the Oscars, with films depicted by blue points as Best Animated Film Academy Award winners. Lastly, in terms of opening weekend gross, we can see that despite being on the lower end of quality ratings, the disappointing movie grouping of Cars, Cars 2, Brave, and Monsters University did not make less money during opening weekend than other films. In fact, in comparing these four films to the other 5 films released since 2005, the average opening weekend gross is actually larger—$79.46 million rather than $75.78 million.

Pivoting from a measure of critical acclaim to a measure of public acclaim in the quality realm, we now plot the same dimensions of success as defined before but we substitute IMDb scores for the Rotten Tomatoes % Fresh metric. This set of scores also suggests mediocrity in Cars, Cars 2, Brave, and Monsters University—however, it also puts A Bug’s Life in the same subpar quality category. Again, we use a loess regression line to exhibit the movement in quality ratings of Pixar movies over time. As was the case before, this line also provides evidence of a return to the old Pixar.

pix2

However, there is one element to note about the nature of IMDb scores–that is, they are often higher when a film is just out. This is because the first people to see and rate films are the hardcore fans, which therefore contributes to a “hype effect,” superficially inflating the aggregate rating. (Speaking of hype and quality discussions…) This could potentially be an issue in currently measuring the public acclaim of Inside Out, as its rating will likely fall to WALL-E / Up levels as months pass.

Despite this particular caveat, the graph still serves as evidence of an improvement in Pixar film quality following its recent senior slump (~ ages 15-18)–an improvement that is fitting since, in a few months, we will be able to welcome Pixar to the world of 20-somethings, the beginning of a new decade in which we are content to forget about the mishaps of adolescence.

Roll the credits

In short, Pixar has faltered in its adolescence, sometimes producing movies that fail to depict the nuanced emotions that color the memories organized within our seemingly endless stockpiles of human experiences. However, just like the wonderfully colored marbles of memories in the Pixar universe, these fifteen films exist within the collective memory as works of art that are, no doubt, greater than the sum of their tangible metrics of success. If Joy herself were to project my memory of Toy Story in the headquarters of my brain, I would not see a small black data point—I would see “Andy” written on the bottom of Woody’s boot and feel something that is beyond a simple, neat linear combination of joy and melancholy—something beyond my or Pixar’s capacity for visualization… Something you can’t even see with 3-D glasses.

«Visualization update»

Thanks to discussion of the aforementioned graphs in the /r/dataisbeautiful universe, I have been made acutely aware of improvements that should be made to my visualizations. In particular, there are two issues from my previous work that are worth quickly addressing:

  1. In my original visualizations, area is scaled non-linearly with the opening weekend gross data. This was a rookie mistake on my part, especially considering that one of the first things the Wikipedia “Bubble chart” article explains is that, “if one chooses to scale the disks’ radii to the third data values directly, then the apparent size differences among the disks will be non-linear and misleading.” As /u/FlailingMildly explained, “It looks to me like the diameter of the points scales with opening weekend gross (110 looks roughly twice as wide as 50). However, our brain doesn’t look at diameter, it looks at area. So the 110 looks more than four times as large as the 50.” 
  2. The blue lines from the original graphs are loess curves, or locally weighted scatterplot smoothings. I reasoned that this choice of smoothing was acceptable as an exploratory feature since the original paper that developed loess explains that: “The first [major use of this local-fitting methodology] is simply to provide an exploratory graphical tool.” However, I knew it could be argued that this curve is over-fitted and better for the purposes of prediction than for conceptual modeling. In the end, individuals on the subreddit came to the conclusion that, in this particular case, the loess curves are not useful since the graph is easy to read without any type of smoothing method. In short, the overarching consensus was that this type of curve is best used for smoothing noisy data–a category to which my Pixar csv file definitely does not belong!

In order to address these genuine issues, I made two quick changes to the previous graphs: (1) I scaled opening weekend box office gross to the area of the circles rather than to their radii, and (2) I excluded the blue loess curves. See the new graphs here:

pix1.1

pix1.2

Lastly, I also present a similarly constructed graph with a y-axis corresponding to Metacritic scores (to add another quality metric into the mix):

pix1.3

Code

Data and R scripts needed to recreate all the included visualizations are available via my Pixar GitHub repo!


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

Geography of Humor: The Case of the New Yorker Caption Contest

Update [9-23-15]: Also check out the newest work on this topic: Which U.S. State Performs Best in the New Yorker Caption Contest?

Intro.

About 10 years ago The New Yorker began a weekly contest. It was not a contest of writing talents in colorful fiction nor of investigative prowess in journalism, instead it was a contest of short and sweet humor. Write a caption for a cartoon, they said. It’ll be fun, they said. This will help our circulation, the marketing department said. Individuals like me, who back at age 12 in 2005 believed The New Yorker was the adult’s version of Calvin and Hobbes that they most enjoyed in doctors’ waiting rooms, embraced the new tradition with open arms.

Now, 10 years later, approximately 5,372 captions are submitted each week, and just a single winner is picked. Upon recently trying my own hand (and failing unsurprisingly given the sheer magnitude of competing captions) at the contest, I wondered, who are these winners? In particular, since The New Yorker always prints the name and place of residence of the caption contest winner, I wondered, what’s the geographical distribution of these winners? 

In order to answer this question, I used my prized subscriber access to the online Caption Contest archive. This archive features the winning caption for each week’s cartoon (along with two other finalist captions) and the name/place of residence of the caption creator. (The archives also feature all other submitted captions–which is super interesting from a machine learning perspective, but I don’t focus on that in this piece.) So, I snagged the geographic information on the past 10 years of winners and went with it.

The basics

For this analysis, I collected information on the first 466 caption contests–that is, all contests up to and including the following:

New Yorker Caption Contest #466

The New Yorker Caption Contest #466

Before getting into the meat of this discussion, it is worth noting the structure of the contest as well as the range of eligible participants. See this quick explanation from The New Yorker:

Each week, we provide a cartoon in need of a caption. You, the reader, submit your caption below, we choose three finalists, and you vote for your favorite… Any resident of the United States, Canada (except Quebec), Australia, the United Kingdom, or the Republic of Ireland age eighteen or older can enter or vote.

Thus, the contest consists of two rounds; one in which the magazine staff sift through thousands of submissions and pick just three as well as one in which the public votes on the ultimate winner out of the three finalists. Furthermore, the contest is open to residents outside the United States–a fact that is easy to forget when considering how often individuals from other countries actually win. Out of 466 caption contest winners, only 12 are from outside the United States–2 from Australia, 2 from British Columbia (Canada), and 8 from Ontario (Canada). Though they are allowed to compete, no one from the United Kingdom, or the Republic of Ireland has ever won. In short, 97.85% of caption contest winners are from the U.S.

Moving to the city-level of geography, it is unsurprising that The New Yorker Caption Contest is dominated by, well, New Yorkers. New York City has 62 wins, meaning New Yorkers have won 13.3% of the contests. In order to fully understand how dominant this makes New York consider the fact that the city with the next most caption contests wins is Los Angeles with a mere 18 wins (3.9% of contests). The graphic below depicting the top 8 caption contest cities further highlights New York’s exceptionalism:

cities

Source: New Yorker Caption Contest Archive; Tool: ggplot2 package in R.

The geographic distribution: a state-level analysis

While both the country- and city-level results are dominated by the obvious contenders (the United States and New York City respectively), the state-level analysis is much more compelling.

In this vein, the first question to address is: which states win the most contests? To answer this, I present the following chrolopeth in which the states are divided into five categories of equal size (each category contains 10 states) based on the number of contests won. (This method uses quantiles to classify the data into ranges, however, there are other methods one could use as well.) Visualizing the data in this way allows us to quickly perceive areas of the country that are caption-winner-rich as well as caption-winner-sparse:

totalwins

Source: New Yorker Caption Contest Archive; Tool: choroplethr package in R.

This visualization illustrates that the most successful caption contest states are either east coast or west coast states, with the exception of Illinois (due to Chicago’s 16 wins). The most barren section of the country is unsurprisingly the center of the country. (In particular, Idaho, Kansas, North/South Dakota, West Virginia, and Wyoming have never boasted any caption contest winners.)

While using quantiles to classify the data into ranges is helpful, it gives us an overly broad last category–the darkest blue class contains states with win totals ranging from 14 to 85. If we want to zoom in and compare the states within this one category, we can pivot to a simple bar chart for precision’s sake. The following graph presents the number of contests won among the top ten states:

top10

Source: New Yorker Caption Contest Archive; Tool: ggplot2 package in R.

New York and California are clearly the most dominant states with 85 and 75 wins respectively, which is to be expected considering how populous the two are. If we were to take into account the population size of a given state that would most definitely yield a superior metric in terms of how well each state does in winning the contest. (It would also be interesting to take into account the number of The New Yorker subscribers by state, but I haven’t been able to get a hold on that data yet, so I am putting a pin in that idea for now.)

Therefore, I normalize these counts by creating a new metric: number of caption contests won per one million state residents.  In making this change, the map colors shift noticably. See the following chrolopeth for the new results:

permill

Source: New Yorker Caption Contest Archive; Tool: choroplethr package in R.

Again, the last category is the one with the broadest range (2.425 to 7.991 wins per million residents). So, once more, it is worth moving away from cool, colorful chropleths and towards the classical bar chart. In comparing the below bar graph with the previous one, one can quickly see the difference made in normalizing by population:

top10cap

Source: New Yorker Caption Contest Archive; Tool: ggplot2 package in R.

For one, the once dominant New York falls behind new-arrivals Vermont and Rhode Island while the similarly previously dominant California is no where to be seen! Other states that also lose their place among the top ten are: Illinois, New Jersey, and Pennsylvania. Meanwhile, the four new states in this updated top ten are: Alaska and New Hampshire as well as the previously mentioned Rhode Island and Vermont. Among these four new arrivals, Vermont stands clearly ahead of the pack with approximately 8 caption contest wins per million residents.

The high counts per million for states like Vermont and Rhode Island suggest a relationship that many were likely considering throughout this entire article–isn’t The New Yorker for liberals? Accordingly, isn’t there a relationship between wins per million and liberalness?

Those damn liberal, nonreligious states

Once we have normalized caption contest wins by population, we still have not completely normalized states by their likeliness to win the contest. This is due to the fact that there is a distinct relationship between wins per million residents and evident political markers of The-New-Yorker-types. In particular, consider Gallup’s State of the States measures of “% liberal” and “% nonreligious.” First, I present the strong association between liberal percentages and wins per million:

libs

Source: New Yorker Caption Contest Archive; Tool: ggplot2 package in R.

The above is a scatterplot in which each point is a state (see the small state abbreviation labels) and the blue line is a linear regression line (the shaded area is the 95% confidence region) fit to the data. The conclusion is unmistakable; states that are more liberal tend to win more contests per million residents. Specifically, the equation for the linear regression line is:

wins_per_million = -3.13 + 0.22(pct_liberal)

This means that a 1 percentage point increase in the liberal percentage is associated with an increase of 0.22 captions per million. The R^2 (in this case, the same as the basic correlation coefficient r^2 between wins_per_million and pct_liberal since there is just one explanatory variable in the regression) is 0.364, meaning that 36.4% of response variable variation is explained by this simple model. (The standard error on the coefficient attached to pct_liberal is only 0.04, meaning the coefficient is easily statistically significant at the 0.1% level).

Also strong is the association between nonreligious percentages and wins per million, presented in the graph below:

nonreg

Source: New Yorker Caption Contest Archive; Tool: ggplot2 package in R.

This plot is very similar to the previous one, most definitely because states with high liberal percentages are likely to have high nonreligious percentages as well. The linear regression line that is fit for this data is:

wins_per_million = -1.37 + 0.09(pct_nonreligious)

The relevant conceptual interpretation is that a 1 percentage point increase in the nonreligious percentage is associated with an increase of 0.09 captions per million. The R^2 for this model is 0.316, so 31.6% of response variable variation is explained by the model. (Again, the coefficient of interest–this time the coefficient attached to pct_nonreligious, is statistically significant at the 0.1% level.)

These two graphs are simple illustrations of the statistically significant relationships between wins per million and two political markers of The New Yorker readership. In order to better understand the relationship between these variables, one must return to the structure of the contest…

The mechanism behind the success of liberal, nonreligious states

The caption contest is broken chronologically into three phases: (1) individuals submit captions, (2) three captions are selected as finalists by magazine staff, and (3) the public votes on their favorite caption.

It seems most likely that the mechanism behind the success of liberal, nonreligious states lies in the first phase. In other words, liberal, nonreligious people are more likely to read The New Yorker and/or follow the caption contest. (Its humor is unlikely to resonate with the intensely religious socially conservative.) Therefore, the tendency towards wins for liberal, nonreligious states is mostly a question of who chooses to participate.

It could also be the case that at least a part of the mechanism behind these states’ successes lies in phases (2) or (3). If a piece of this mechanism was snuggled up in phase 2, that would mean The New Yorker staff is inclined due to an innate sense of liberal humor to pick captions from specific states. (Yet, since most submissions are probably already from liberals, this seems unlikely–though maybe the reverse happens as the magazine attempts to foster geographic diversity by selecting captions from a broader range of locations? I don’t think that’s part of the caption selection process, but it could be relevant to the aforementioned mechanism if it were.) If the mechanism were instead hidden within the third phase, this would mean voters tend to vote for captions created by people from more nonreligious and liberal states in the country. One interesting element to note is that voters can see the place of residence of a caption creator–though I highly doubt this influences peoples’ voting choices, it is possible that regional favoritism is a factor (e.g., New Yorkers like to see other New Yorkers win and, therefore, the large number of New Yorker voters pushes the New Yorker caption submissions to win).

In order to better investigate the mechanism behind the success of nonreligious, liberal states, one needs access to the geographic data of all submissions…or, at least the data on the number of subscribers per state. Though one can submit to the contest without a subscription, the latter measure could still be used as a credible proxy for the former since the number of people who submit to the contest in a state is likely proportional to the number of subscribers in the state.

A thank you note

Thanks to my family for giving me a subscription to The New Yorker this past holiday season in a desperate attempt to help me become less of a philistine. My sincerest apologies that I have focused more on the cartoons than all those chunks of words that mark space in between.

How-about-never-cartoon

I’ll be sure to actually call you all up if I ever win–good news: if I enter every contest for the next ten years I’ll have approximately a 10% chance of winning just by chance alone.

Me & Bob Mankoff (Cartoon Editor of The New Yorker and creator of the above cartoon)

Me & Bob Mankoff! (Cartoon Editor of The New Yorker and creator of the above cartoon)

Future work
  • Make maps interactive (using Mapbox/TillMill/qgis and the like) and embed into page with the help of Sarah Michael Levine!
  • Look at captions per number of subscribers in a state (even though you can submit even if you’re not a subscriber–I assume submissions from a state would be proportional to the number of subscribers)
  • See if it’s possible to collect state data on all submitted captions in order to test hypotheses related to the mechanism behind the success of liberal, nonreligious states
  • Create predictive model with wins per million as the dependent variable
    • Independent variables could include proximity to New York or a dummy variable based on if the state is in northeast, income per capita, percent liberal, percent nonreligious, (use logs?) etc.
      • However, the issue with many of these is that there is likely to be multicollinearity since so many of these independent variables are highly correlated…Food for thought
        • In particular, it is not worthwhile to include both % liberal and % nonreligious in one regression (one loses statistical significance altogether and the other goes from the 0.1% level to the 5% level)
Code

All data and R scripts needed to recreate all types of visualizations in this article (choropleths, bar charts, and scatterplots with linear regression lines) are available on my “NewYorker” Github repo).


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

The Rise of the New Kind of Cabbie: A Comparison of Uber and Taxi Drivers

Intro

One day back in the early 2000’s, I commandeered one of my mom’s many spiral notebooks. I’d carry the notebook all around Manhattan, allowing it to accompany me everywhere from pizza parlors to playgrounds, while the notebook waited eagerly for my parents to hail a taxicab so it could fulfill its eventual purpose. Once in a cab, after clicking my seat belt into place (of course!), I’d pull out the notebook in order to develop one of my very first spreadsheets. Not the electronic kind, the paper kind. I made one column for the date of the cab ride, another for the driver’s medallion number (5J31, 3A37, 7P89, etc.) and one last one for the driver’s full name–both the name and number were always readily visible, pressed between two slabs of Plexiglas that intentionally separate the back from the front seat. Taxi drivers always seemed a little nervous when they noticed I was taking down their information–unsure of whether this 8-year-old was planning on calling in a complaint about them to the Taxi and Limousine Commission. I wasn’t planning on it.

Instead, I collected this information in order to discover if I would ever ride in the same cab twice…which I eventually did! On the day that I collected duplicate entries in the second and third columns, I felt an emotional connection to this notebook as it contained a time series of yellow cab rides that ran in parallel with my own development as a tiny human. (Or maybe I just felt emotional because only children can be desperate for friendship, even when it’s friendship with a notebook.) After pages and pages of observations, collected over the years using writing implements ranging from dull pencils to thick Sharpies, I never would have thought that one day yellow cabs would be eclipsed by something else…

Something else

However, today in 2015, according to Taxi and Limousine Commission data, there are officially more Uber cars in New York City than yellow cabs! This is incredible not just because of the speed of Uber’s growth but also since riding with Uber and other similar car services (Lyft, Sidecar) is a vastly different experience than riding in a yellow cab. Never in my pre-Uber life did I think of sitting shotgun. Nor did I consider starting a conversation with the driver. (I most definitely did not tell anyone my name or where I went to school.) Never did my taxi driver need to use an iPhone to get me to my destination. But, most evident to me is the distinction between the identities of the two sets of drivers. It is undoubtedly obvious that compared to traditional cab service drivers, Uber drivers are younger, whiter, more female, and more part-time. Though I have continuously noted these distinctions since growing accustomed to Uber this past summer, I did not think that there was data for illustrating these distinctions quantitatively. However, I recently came across the paper “An Analysis of the Labor Market for Uber’s Driver-Partners in the United States,” written by (Economists!) Jonathan Hall and Alan Krueger. The paper supplies tables that summarize characteristics of both Uber drivers and their conventional taxi driver/chauffeur counterparts. This allows for an exercise in visually depicting the differences between the two opposing sets of drivers—allowing us to then accurately define the characteristics of a new kind of cabbie.  

The rise of the younger cabbie
age

Data source: Hall and Krueger (2015). Visualization made using ggplot2.

The above figure illustrates that Uber drivers are noticeably younger than their taxi counterparts. (From here on, when I discuss taxis I am also implicitly including chauffeurs. If you’d like to learn more about the source of the data and the collection methodology, refer directly to the paper.) For one, the age range including the highest percentage of Uber drivers is the 30-39 range (with 30.1% of drivers) while the range including the highest percentage of taxi drivers is the 50-64 range (with 36.6% of drivers). While about 19.1% of Uber drivers are under 30, only about 8.5% of taxi drivers are this young. Similarly, while only 24.5% of Uber drivers are over 50, 44.3% of taxi drivers are over this threshold. This difference in age is not very surprising given that Uber is a technological innovation and, therefore, participation is skewed to younger individuals.

The rise of the more highly educated cabbie
educ

Data source: Hall and Krueger (2015). Visualization made using ggplot2.

This figure illustrates that Uber drivers, on the whole, are more highly educated than their taxi counterparts. While only 12.2% of Uber drivers do not possess a level of education beyond high school completion, the majority of taxi drivers (52.5%) fall into this category. The percentage of taxi drivers with at least a college degree is a mere 18.8%, but the percentage of Uber drivers with at least a college degree is 47.7%, which is even higher than that percentage for all workers, 41.1%. Thus, Uber’s rise has created a new class of drivers whose higher education level is superior to that of the overall workforce. (Though it is worth noting that the overall workforce boasts a higher percentage of individuals with postgraduate degrees than does Uber–16% to 10.8%.)

The rise of the whiter cabbie
race

Data source: Hall and Krueger (2015). Visualization made using ggplot2.

On the topic of race, conventional taxis boast higher percentages of all non-white racial groups except for the “Other Non-Hispanic” group, which is 3.9 percentage points higher among the Uber population. The most represented race among taxi drivers is black, while the most represented race among Uber drivers is white. 19.5% of Uber drivers are black while 31.6% of taxi drivers are black, and 40.3% of Uber drivers are white while 26.2% of taxi drivers are white. I would be curious to compare the racial breakdown of Uber’s drivers to that of Lyft and Sidecar’s drivers as I suspect the other two might not have populations that are as white (simply based on my own small and insufficient sample size).

The rise of the female cabbie
gender

Data source: Hall and Krueger (2015). Visualization made using ggplot2.

It has been previously documented how Uber has helped women begin to “break into” the taxi industry. While only 1% of NYC yellow cab drivers are women and 8% of taxis (and chauffeurs) as a whole are women, an impressive 14% of Uber drivers are women–a percentage that is likely only possible in the driving industry due to the safety that Uber provides via the information on its riders.

The rise of the very-part-time cabbie
hours

Data source: Hall and Krueger (2015). Visualization made using ggplot2.

A whopping 51% of Uber drivers drive a mere 1-15 hours per week though only 4% of taxis do so. This distinction in driving times between the two sets of drivers makes it clear that Uber drivers are more likely to be supplementing other sources of income with Uber work, while taxi drivers are more likely to be working as a driver full-time (81% of taxis drive more than 35 hours a week on average, but only 19% of Uber drivers do so). In short, it is very clear that Uber drivers treat driving as more of a part-time commitment than do traditional taxi drivers.

Uber by the cities

As a bonus, beyond profiling the demographic and behavioral differences between the two classes of drivers, I present some information about how Uber drivers differ city by city. While this type of comparison could also be extremely interesting for demographic data (gender, race, etc.), hours worked and earnings are the only available pieces of information profiled by city in Hall and Krueger (2015).

Uber by the cities: hours
cities

Data source: Hall and Krueger (2015). Data on uberX drivers for October 2014. Visualization made using ggplot2.

New York is the city that possesses the least part-time uberX drivers. (Note: This data is only looking at hours worked for uberX drivers in October 2014.) Only 42% work 1-15 hours while the percentage for the other cities ranges from 53-59%. Similarly, 23% of NYC Uber drivers work 35+ hours while the percentage for other cities ranges from 12-16%. Though these breakdowns are different for each of the six cities, the figure illustrates that Uber driving is treated pretty uniformly as a part-time gig throughout the country.

Uber by the cities: earnings

Also in the report was a breakdown of median earnings per hour by city. An important caveat here is that these are gross pay numbers and, therefore, they do not take into account the costs of driving a Taxi or an Uber. If you’d like to read a quick critique of the paper’s statement that “the net hourly earnings of Uber’s driver-partners exceed the hourly wage of employed taxi drivers and chauffeurs, on average,” read this. However, I will not join this discussion and instead focus only on gross pay numbers since costs are indeed unknown.

earnings

Data source: Hall and Krueger (2015). Uber earnings data from October 2014. Taxi earnings data from May 2013. Visualization made using ggplot2.

According to the report’s information, NYC Uber drivers take in the highest gross earnings per hour ($30.35), followed by SF drivers ($25.77). These are also the same two cities in which the traditional cabbies make the most, however while NYC taxi counterparts make a few dollars more per hour than those in other cities, the NYC Uber drivers make more than 10 dollars per hour more than Boston, Chicago, DC, and LA Uber drivers.

Endnote

There is no doubt that the modern taxi experience is different from the one that I once cataloged in my stout, spiral notebook. Sure, Uber drivers are younger than their conventional cabbie counterparts. They are more often female and more often white. They are more likely to talk to you and tell you about their other jobs or interests. But, the nature of the taxi industry is changing far beyond the scope of the drivers. In particular, information that was once unknown (who took a cab ride with whom and when?) to those not in possession of a taxi notebook is now readily accessible to companies like Uber. Now, this string of recorded Uber rides is just one element in an all-encompassing set of (technologically recorded) sequential occurrences that can at least partially sketch out a skeleton of our lived experiences…No pen or paper necessary.

Bonus: a cartoon!
uberouterspace

The New Yorker Caption Contest for this week with my added caption. The photo was too oddly relevant to my current Uber v. Taxi project for me to not include it!

 Future work (all of which requires access to more data)
  • Investigate whether certain age groups for Uber are dominated by a specific race, e.g. is the 18-39 group disproportionately white while the 40+ group is disproportionately non-white?
  • Request data on gender/race breakdowns for Uber and Taxis by city
    • Looking at the racial breakdowns for NYC would be particularly interesting since the NYC breakdown is likely very different from that of cabbies throughout the rest of the country (this data is not available in the Taxicab Fact Book)
  • Compare characteristics by ride-sharing service: Uber, Lyft, and Sidecar
  • Investigate distribution of types of cars driven by Uber, Lyft, and Sidecar (Toyota, Honda, etc.)
Code

All data and R scripts needed to recreate these visualizations are available on my “UbervTaxis” Github repo.


© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.