The Rise of the New Kind of Cabbie: A Comparison of Uber and Taxi Drivers

Bar Charts, Stacked Bar Charts

One day back in the early 2000’s, I commandeered one of my mom’s many spiral notebooks. I’d carry the notebook all around Manhattan, allowing it to accompany me everywhere from pizza parlors to playgrounds, while the notebook waited eagerly for my parents to hail a taxicab so it could fulfill its eventual purpose. Once in a cab, after clicking my seat belt into place (of course!), I’d pull out the notebook in order to develop one of my very first spreadsheets. Not the electronic kind, the paper kind. I made one column for the date of the cab ride, another for the driver’s medallion number (5J31, 3A37, 7P89, etc.) and one last one for the driver’s full name–both the name and number were always readily visible, pressed between two slabs of Plexiglas that intentionally separate the back from the front seat. Taxi drivers always seemed a little nervous when they noticed I was taking down their information–unsure of whether this 8-year-old was planning on calling in a complaint about them to the Taxi and Limousine Commission. I wasn’t planning on it.

Instead, I collected this information in order to discover if I would ever ride in the same cab twice…which I eventually did! On the day that I collected duplicate entries in the second and third columns, I felt an emotional connection to this notebook as it contained a time series of yellow cab rides that ran in parallel with my own development as a tiny human. (Or maybe I just felt emotional because only children can be desperate for friendship, even when it’s friendship with a notebook.) After pages and pages of observations, collected over the years using writing implements ranging from dull pencils to thick Sharpies, I never would have thought that one day yellow cabs would be eclipsed by something else…

Something else

However, today in 2015, according to Taxi and Limousine Commission data, there are officially more Uber cars in New York City than yellow cabs! This is incredible not just because of the speed of Uber’s growth but also since riding with Uber and other similar car services (Lyft, Sidecar) is a vastly different experience than riding in a yellow cab. Never in my pre-Uber life did I think of sitting shotgun. Nor did I consider starting a conversation with the driver. (I most definitely did not tell anyone my name or where I went to school.) Never did my taxi driver need to use an iPhone to get me to my destination. But, most evident to me is the distinction between the identities of the two sets of drivers. It is undoubtedly obvious that compared to traditional cab service drivers, Uber drivers are younger, whiter, more female, and more part-time. Though I have continuously noted these distinctions since growing accustomed to Uber this past summer, I did not think that there was data for illustrating these distinctions quantitatively. However, I recently came across the paper “An Analysis of the Labor Market for Uber’s Driver-Partners in the United States,” written by (Economists!) Jonathan Hall and Alan Krueger. The paper supplies tables that summarize characteristics of both Uber drivers and their conventional taxi driver/chauffeur counterparts. This allows for an exercise in visually depicting the differences between the two opposing sets of drivers—allowing us to then accurately define the characteristics of a new kind of cabbie.  

The rise of the younger cabbie


The above figure illustrates that Uber drivers are noticeably younger than their taxi counterparts. (From here on, when I discuss taxis I am also implicitly including chauffeurs. If you’d like to learn more about the source of the data and the collection methodology, refer directly to the paper.) For one, the age range including the highest percentage of Uber drivers is the 30-39 range (with 30.1% of drivers) while the range including the highest percentage of taxi drivers is the 50-64 range (with 36.6% of drivers). While about 19.1% of Uber drivers are under 30, only about 8.5% of taxi drivers are this young. Similarly, while only 24.5% of Uber drivers are over 50, 44.3% of taxi drivers are over this threshold. This difference in age is not very surprising given that Uber is a technological innovation and, therefore, participation is skewed to younger individuals.

The rise of the more highly educated cabbie


This figure illustrates that Uber drivers, on the whole, are more highly educated than their taxi counterparts. While only 12.2% of Uber drivers do not possess a level of education beyond high school completion, the majority of taxi drivers (52.5%) fall into this category. The percentage of taxi drivers with at least a college degree is a mere 18.8%, but the percentage of Uber drivers with at least a college degree is 47.7%, which is even higher than that percentage for all workers, 41.1%. Thus, Uber’s rise has created a new class of drivers whose higher education level is superior to that of the overall workforce. (Though it is worth noting that the overall workforce boasts a higher percentage of individuals with postgraduate degrees than does Uber–16% to 10.8%.)

The rise of the whiter cabbie


On the topic of race, conventional taxis boast higher percentages of all non-white racial groups except for the “Other Non-Hispanic” group, which is 3.9 percentage points higher among the Uber population. The most represented race among taxi drivers is black, while the most represented race among Uber drivers is white. 19.5% of Uber drivers are black while 31.6% of taxi drivers are black, and 40.3% of Uber drivers are white while 26.2% of taxi drivers are white. I would be curious to compare the racial breakdown of Uber’s drivers to that of Lyft and Sidecar’s drivers as I suspect the other two might not have populations that are as white (simply based on my own small and insufficient sample size).

The rise of the female cabbie


It has been previously documented how Uber has helped women begin to “break into” the taxi industry. While only 1% of NYC yellow cab drivers are women and 8% of taxis (and chauffeurs) as a whole are women, an impressive 14% of Uber drivers are women–a percentage that is likely only possible in the driving industry due to the safety that Uber provides via the information on its riders.

The rise of the very-part-time cabbie


A whopping 51% of Uber drivers drive a mere 1-15 hours per week though only 4% of taxis do so. This distinction in driving times between the two sets of drivers makes it clear that Uber drivers are more likely to be supplementing other sources of income with Uber work, while taxi drivers are more likely to be working as a driver full-time (81% of taxis drive more than 35 hours a week on average, but only 19% of Uber drivers do so). In short, it is very clear that Uber drivers treat driving as more of a part-time commitment than do traditional taxi drivers.

Uber by the cities

As a bonus, beyond profiling the demographic and behavioral differences between the two classes of drivers, I present some information about how Uber drivers differ city by city. While this type of comparison could also be extremely interesting for demographic data (gender, race, etc.), hours worked and earnings are the only available pieces of information profiled by city in Hall and Krueger (2015).

Uber by the cities: hours


New York is the city that possesses the least part-time uberX drivers. (Note: This data is only looking at hours worked for uberX drivers in October 2014.) Only 42% work 1-15 hours while the percentage for the other cities ranges from 53-59%. Similarly, 23% of NYC Uber drivers work 35+ hours while the percentage for other cities ranges from 12-16%. Though these breakdowns are different for each of the six cities, the figure illustrates that Uber driving is treated pretty uniformly as a part-time gig throughout the country.

Uber by the cities: earnings

Also in the report was a breakdown of median earnings per hour by city. An important caveat here is that these are gross pay numbers and, therefore, they do not take into account the costs of driving a Taxi or an Uber. If you’d like to read a quick critique of the paper’s statement that “the net hourly earnings of Uber’s driver-partners exceed the hourly wage of employed taxi drivers and chauffeurs, on average,” read this. However, I will not join this discussion and instead focus only on gross pay numbers since costs are indeed unknown.


According to the report’s information, NYC Uber drivers take in the highest gross earnings per hour ($30.35), followed by SF drivers ($25.77). These are also the same two cities in which the traditional cabbies make the most, however while NYC taxi counterparts make a few dollars more per hour than those in other cities, the NYC Uber drivers make more than 10 dollars per hour more than Boston, Chicago, DC, and LA Uber drivers.


There is no doubt that the modern taxi experience is different from the one that I once cataloged in my stout, spiral notebook. Sure, Uber drivers are younger than their conventional cabbie counterparts. They are more often female and more often white. They are more likely to talk to you and tell you about their other jobs or interests. But, the nature of the taxi industry is changing far beyond the scope of the drivers. In particular, information that was once unknown (who took a cab ride with whom and when?) to those not in possession of a taxi notebook is now readily accessible to companies like Uber. Now, this string of recorded Uber rides is just one element in an all-encompassing set of (technologically recorded) sequential occurrences that can at least partially sketch out a skeleton of our lived experiences…No pen or paper necessary.

Bonus: a cartoon!

The New Yorker Caption Contest for this week with my added caption. The photo was too oddly relevant to my current Uber v. Taxi project for me to not include it!

 Future work (all of which requires access to more data)
  • Investigate whether certain age groups for Uber are dominated by a specific race, e.g. is the 18-39 group disproportionately white while the 40+ group is disproportionately non-white?
  • Request data on gender/race breakdowns for Uber and Taxis by city
    • Looking at the racial breakdowns for NYC would be particularly interesting since the NYC breakdown is likely very different from that of cabbies throughout the rest of the country (this data is not available in the Taxicab Fact Book)
  • Compare characteristics by ride-sharing service: Uber, Lyft, and Sidecar
  • Investigate distribution of types of cars driven by Uber, Lyft, and Sidecar (Toyota, Honda, etc.)

The R notebook for replicating all visuals is available here. See full github repo for the data as well. (Both updated 7-27-17)

© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.

(TGI)Friday the 13th

Bar Charts, Heat Maps, Network Visualizations

I was born on November 13th, 1992. And, while that might be sufficient information to verify my identity to that pre-recorded human on the other end of my calls to the bank, there is an important detail always excluded in this formal expression of one’s birth date: the day of the week. And what makes this seemingly unremarkable fact about me exciting (or disturbing, depending on your take) is that pesky, frequently ignored detail. I was born on a Friday.

Yes, Friday the 13th, the universally proclaimed day of bad luck. Black cats, the devil, all that jazz. Despite the fact that it was the strong but random force of chance, not some meaningful destiny, that sealed my birth date, I still feel a personal tie to the dark and twisty combination. And, given this past February’s Friday the 13th immediately followed by another, today(!), in March, I decided to revisit an old question: how frequent is Friday the 13th anyway? And how often does this February-March combination happen? Are there other regular month combinations?


In response to the former, in the long run, Friday is actually the day of the week mostly likely to fall on the 13th! (Just by a tiny bit…the probability of a Friday being on the 13th is 0.1433 while the probabilities of other days falling on the 13th are 0.1425 (Thursday & Saturday), 0.1427 (Monday & Tuesday), and 0.1431 (Wednesday & Sunday).) Over the past thirty years, the average number of Friday the 13th’s in a year was approximately 1.74 (which is a higher average than expected if one assumed there was exactly a 1/7th chance of a Friday the 13th every month of every year: (1/7)*12=1.714). See below for a visualization of Friday the 13th (or F13 from now on) frequencies over the past three decades:


Plot made with ggplot2 package in R.

It is obvious from this graph that each of the past thirty years contain at least one F13. This is because it is actually impossible to have a year without any F13’s. In fact, this fact is pretty quickly mathematically proven. (See proof below. Or skip ahead if that’s not your thing.)

Quick proof

Quoting directly from a StackExchange solution:

A month has a Friday 13th if and only if it begins on a Sunday.

On a regular (non-leap) year, if January begins on day k, 0k6 (with k=0 being Sunday), then we have that:

  • January begins on day k;
  • February begins on day k+3mod7 (since January has 31 days, and 313(mod7);
  • March begins on day k+3mod7;
  • April begins on day k+6mod7;
  • May begins on day k+8=k+1mod7 (since April has 30 days, and 302(mod7));
  • June begins on day k+4mod7;
  • July begins on day k+6mod7;
  • August begins on day k+9=k+2mod7;
  • September begins on day k+5mod7;

With these, we already have day k, k+1, k+2, k+3, k+4, k+5, and k+6, so at least one of these months will begin on Sunday, guaranteeing at least one Friday 13th.

For Leap years, the analysis is similar, except that:

  • January begins on day k;
  • February begins on day k+3;
  • March begins on day k+4;
  • April begins on day k;
  • May begins on day k+2;
  • June begins on day k+5;
  • July begins on day k;
  • August begins on day k+3;
  • September begins on day k+6;
  • October begins on day k+1.

So at the latest, you will have a Friday 13th by October.

Less proofs, more visuals

The distribution of F13’s over different months is very evenly aggregated over the past thirty years; six months contained a F13 four times, while the other six months contained one five times. However, all months are not equally likely to share a F13 in the same year with all other months. For instance, in the past thirty years, March and November both had F13’s in five of the same years, while March and May never had a F13 in the same year. One can verify this claim that certain month combinations are more frequent that others by inspecting the following heat map visualization (click to enlarge):


A box is red is that month-year combination included a Friday the 13th and grey otherwise. Plot made with ggplot2 package in R.

Most evident in the above plot is the February-March-November combination–immediately noticeable since February and March are right next to one another. It turns out that our current year, 2015, is one of five in the past thirty with three F13’s, and also one of four years (1987, 1998, 2009, 2015) in the past thirty featuring the February-March-November combination.

While the February-March-November combination is the most frequent trio in the past thirty years (the only other one being January-April-July in 2012), there are duos that are just as frequent during this time period. In order to best see the all the combinations of months that have had F13’s in the same year, I created a network visualization. This network features an bi-directional edge between two months (represented by red vertices) if they have both had a F13 in the same year. Alternatively, a vertex can also have a loop (an edge that connects it to itself) if it was the only month in a year to feature a F13.


Network depicting edges between month vertices if the two have included a Friday the 13th in the same year. A loop is shown if a given month has been the only month in any year to have a Friday the 13th. Network made using the igraph package in R.

(There are no weights for the edges in this network. Instead, an edge between two vertices, a and b, is determined by the binary response to: in a given year, have months a and b ever both included a Friday the 13th? If no, no edge exists between a and b. If yes, an edge exists between a and b. However, one could easily extend this work by weighting the edges to depict frequencies of various month combinations.)

This network breaks down into four smaller graphs of sizes 1, 1, 5, and 7 (size in a graph is defined as the number of edges). From this network, one can see clearly that May is the only month in the past thirty years to have never shared a F13 with another month in the same year. Also, over this time periods, September had a F13 if and only if December did as well. Lastly, January, February, and April are the months that have shared a F13s with the most other months.


Despite the fact that there is nothing mathematically extraordinary about occurrences of November Friday the 13th’s, I still hold tight to my personal connection to the date. While November Friday the 13th is not incredibly rare, it did turn out that 1992 was the only year in the past thirty to have a November Friday the 13th without both February and March Friday the 13th’s preceding it. (See the heat map!) And if that’s a fact I can use to infuse some sort of mathematical significance into my birthday then I am using it. After all, it is this very date that has given me an affinity for a number usually considered substandard as well as the birthright to scoff, personally offended, when an apartment building elevator disrespectfully skips from the 12th to the 14th floor.

Future work
  • Add weights to edges to depict frequencies of various month combinations
  • Use network centrality methods paired with edge weights to determine the months that are the most central. (At this moment, using simple degree centrality measures, the most central months would be January, February, and April–with June included if you count loops towards degree measures.)

All data and R scripts needed to recreate these visualizations are available on my “Fri13” Github repo.

© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.