The One With All The Quantifiable Friendships, Part 2

Bar Charts, Line Charts, Nightingale Graphs, Stacked Area Charts, Time Series

Since finishing my first year of my PhD, I have been spending some quality time with my computer. Sure, the two of us had been together all throughout the academic year, but we weren’t doing much together besides pdf-viewing and type-setting. Around spring break, when I discovered you can in fact freeze your computer by having too many exams/section notes/textbooks simultaneously open, I promised my MacBook that over the summer we would try some new things together. (And that I would take out her trash more.) After that promise and a new sticker addition, she put away the rainbow wheel.

Cut to a few weeks ago. I had a blast from the past in the form of a Twitter notification. Someone had written a post about using R to analyze the TV show Friends, which was was motivated by a similar interest that drove me to write something about the show using my own dataset back in 2015. In the post, the author, Giora Simchoni, used R to scrape the scripts for all ten seasons of the show and made all that work publicly available (wheeeeee) for all to peruse. In fact, Giora even used some of the data I shared back in 2015 to look into character centrality. (He makes a convincing case using a variety of data sources that Rachel is the most central friend of the six main characters.) In reading about his project, I could practically hear my laptop humming to remind me of its freshly updated R software and my recent tinkering with R notebooks. (Get ready for new levels of reproducibility!) So, off my Mac and I went, equipped with a new workflow, to explore new data about a familiar TV universe.

Who’s Doing The Talking?

Given line by line data on all ten seasons, I, like Giora, first wanted to look at line totals for all characters. In aggregating all non-“friends” characters together, we get the following snapshot:


First off, why yes, I am using the official Friends font. Second, I am impressed by how close the totals are for all characters though hardly surprised that Phoebe has the least lines. Rachel wouldn’t be surprised either…

Rachel: Ugh, it was just a matter of time before someone had to leave the group. I just always assumed Phoebe would be the one to go.

Phoebe: Ehh!!

Rachel: Honey, come on! You live far away! You’re not related. You lift right out.

With these aggregates in hand, I then was curious: how would line allocations look across time? So, for each episode, I calculate the percentage of lines that each character speaks, and present the results with the following three visuals (again, all non-friends go into the “other” category):


Tell me that first graph doesn’t look like a callback to Rachel’s English Trifle. Anyway, regardless of a possible trifle-like appearance, all the visuals illustrate dynamics of an ensemble cast; while there is noise in the time series, the show consistently provides each character with a role to play. However, the last visual does highlight some standouts in the collection of episodes that uncharacteristically highlight or ignore certain characters. In other words, there are episodes in which one member of the cast receives an unusually high or low percentage of the lines in the episode. The three episodes that boast the highest percentages for a single member of the gang are: “The One with Christmas in Tulsa” (41.9% Chandler), “The One With Joey’s Interview” (40.3% Joey), “The One Where Chandler Crosses a Line” (36.3% Chandler). Similarly, the three with the lowest percentages for one of the six are: “The One With The Ring” (1.5% Monica) , “The One With The Cuffs” (1.6% Ross), and “The One With The Sonogram At The End” (3.3% Joey). The sagging red lines of the last visual identify episodes that have a low percentage of lines spoken by a character outside of the friend group. In effect, those dips in the graph point to extremely six-person-centric episodes, such as “The One On The Last Night” (0.4% non-friends dialogue–a single line in this case), “The One Where Chandler Gets Caught” (1.1% non-friends dialogue), and “The One With The Vows” (1.2% non-friends dialogue).

The Men Vs. The Women

Given this title, here’s a quick necessary clip:

Now, how do the line allocations look when broken down by gender lines across the main six characters? Well, the split consistently bounces around 50-50 over the course of the 10 seasons. Again, as was the case across the six main characters, the balanced split of lines is pretty impressive.


Note that the second visual highlights that there are a few episodes that are irregularly man-heavy. The top three are: “The One Where Chandler Crosses A Line” (77.0% guys), “The One With Joey’s Interview” (75.1% guys), and “The One With Mac and C.H.E.E.S.E.” (70.2% guys). There are also exactly two episodes that feature a perfect 50-50 split for lines across gender: “The One Where Rachel Finds Out” and “The One With The Thanksgiving Flashbacks.”

Say My Name

How much do the main six characters address or mention one another? Giora addressed this question in his post, and I build off of his work by including nicknames in the calculations, and using a different genre of visualization. With respect to the nicknames–“Mon”, “Rach”, “Pheebs”, and “Joe”–“Pheebs” is undoubtably the stickiest of the group. Characters say “Pheebs” 370 times, which has a comfortable cushion over the second-place nickname “Mon” (used 73 times). Characters also significantly differ in their usage of each others’ nicknames. For example, while Joey calls Phoebe “Pheebs” 38.3% of the time, Monica calls her by this nickname only 4.6% of the time. (If you’re curious about more numbers on the nicknames, check out the project notebook.)

Now, after adding in the nicknames, who says whose name? The following graphic addresses that point of curiosity:


The answer is clear: Rachel says Ross’s name the most! (789 times! OK, we get it, Rachel, you’re in love.) We can also see that Joey is the most self-referential with 242 usages of his own name–perhaps not a shock considering his profession in the entertainment biz. Overall, the above visual provides some data-driven evidence of the closeness between certain characters that is clearly evident in watching the show. Namely, the Joey-Chandler, Monica-Chandler, Ross-Rachel relationships that were evident in my original aggregation of shared plot lines are still at the forefront!


Comparing the above work to what I had originally put together in January 2015 is a real trip. My original graphics back in 2015 were made entirely in Excel and were as such completely unreproducible, as was the data collection process. The difference between the opaqueness of that process and the transparency of sharing notebook output is super exciting to me… and to my loyal MacBook. Yes, yes, I’ll give you another sticker soon.

Let’s see the code!

Here is the html rendered R Notebook for this project. Here is the Github repo with the markdown file included.

*Screen fades to black* 
Executive Producer: Alex Albright

© Alexandra Albright and The Little Dataset That Could, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.



Go East, young woman

Comparisons, Line Charts, Time Series
We’ll always have Palo Alto[1]

It is 9:30pm PST on Friday evening and my seat beat is buckled. The lights are a dim purple as they always are on Virgin America flights. As if we are all headed off to a prom on the opposite side of the country together. My favorite safety video in the industry starts to play–an accumulation of visuals and beats that usually gives me a giddy feeling that only Beyoncé videos have the power to provoke–however, in this moment, I begin to tear up despite the image of a shimmying nun displayed in front of me. In my mind, overlaying the plane-inspired choreography is a projection of Rick Blaine reminding me in my moments of doubt that, I belong on this plane [2]: “If that plane leaves the ground and you’re not [in it], you’ll regret it. Maybe not today. Maybe not tomorrow, but soon and for the rest of your life.” I whisper “here’s looking at you, kid” to the screen now saturated with dancing flight attendants and fade into a confused dreamscape: Silicon Valley in black and white–founders still wear hoodies, but they have tossed on hats from the ’40s.

A few days later, I am now living in Cambridge, MA. While my senses are overcome by a powerful ensemble of changes, some more discreet or intangible than others, there is one element of the set that is clear, striking, and quantifiable. The thickness and heat in the air that was missing from Palo Alto and San Francisco. After spending a few nights out walking (along rivers, across campuses, over and under bridges, etc.) in skirts and sandals without even the briefest longing for a polar fleece, I am intent on documenting the difference between Boston and San Francisco temperatures. Sure, I can’t quantify every dimension of change that I experience, but, hey, I can chart temperature differences.

Coding up weather plots

In order to investigate the two cities and their relevant weather trends, I adapted some beautiful code that was originally written by Bradley Boehmke in order to generate Tufte-inspired weather charts using R (specifically making use of the beloved ggplot2 package). The code is incredible in how simple it is to apply to any of the cities that have data from the University of Dayton’s Average Daily Temperature archive.[3] Below are the results I generated for SF and Boston, respectively[4]:



While one could easily just plot the recent year’s temperature data (2015, as marked by the black time series, in this case), it is quickly evident that making use of historical temperature data helps to both smooth over the picture and put 2015 temperatures in context. The light beige for each day in the year shows the range from historical lows and to historical highs in the time period of 1995-2014. Meanwhile, the grey range presents the 95% confidence interval around daily mean temperatures for that same time period. Lastly, the presence of blue and red dots illustrates the days in 2015 that were record lows or highs over the past two decades. While Boston had a similar number of red and blue dots for 2015, SF is overpowered by red. Almost 12% of SF days were record highs relative to the previous twenty years. Only one day was a record low.

While this style of visualization is primarily intuitive for comparing a city’s weather to its own historical context, there are also a few quick points that strike me from simple comparisons across the two graphs. I focus on just three quick concepts that are borne out by the visuals:

  1. Boston’s seasons are unmistakable.[5] While the normal range (see darker swatches on the graph) of temperatures for SF varies between 50 (for winter months) and 60 degrees (for late summer and early fall months), the normal range for Boston is notably larger and ranges from the 30’s (winter and early spring months) to the 70’s (summer months). The difference in the curve of the two graphs makes this difference throughout the months painfully obvious. San Francisco’s climate is incredibly stable in comparison with east coast cities–a fact that is well known, but still impressive to see in visual form!
  2. There’s a reason SF can have Ultimate Frisbee Beach League in the winter. Consider the relative wonderfulness of SF in comparison to Boston during the months of January to March. In 2015, SF ranged from 10 to 55 degrees (on a particularly toasty February day) warmer than Boston for those months. In general, most differences on a day-to-day basis are around +20 to +40 degrees for SF.
  3. SF Summer is definitely ‘SF Winter’ if one defines its temperature relative to that of other climates. In 2015, the summer months in SF were around 10 degrees colder than were the summer months in Boston. While SF summer is warmer than actual SF winter in terms of absolute temperature comparisons, comparing the temperatures to other areas of the country quickly yields SF summer as the relatively chilliest range of the year.

Of course, it is worth noting that the picture from looking at simple temperature alone is not complete. More interesting than this glance at basic temperature would be an investigation into the “feels like” temperature, which usually takes into account factors such as wind speeds and humidity. Looking into these more complex measurements would very likely heighten the clear distinction in Boston seasons as well as potentially strengthen the case for calling SF summer ‘SF winter’, given the potential stronger presence of wind chill during the summer months.[6]

The coldest winter I ever spent…[7]

It is 6:00am EST Saturday morning in Boston, MA. Hot summer morning is sliced into by divine industrial air conditioning. Hypnotized by luggage seemingly floating on the baggage claim conveyor belt and slowly emerging from my black and white dreams, I wonder if Ilsa compared the weather in Lisbon to that in Casablanca when she got off her plane… after contacts render the lines and angles that compose my surroundings crisp again, I doubt it. Not only because Ilsa was probably still reeling from maddeningly intense eye contact with Rick, but also because Lisbon and Morocco are not nearly as markedly different in temperature as are Boston and San Francisco.

Turns out that the coldest winter I will have ever spent will be winter in Boston. My apologies to summer in San Francisco.


[1] Sincere apologies to those of you in the Bay Area who have had to hear me make this joke a few too many times over the past few weeks.

[2] Though definitely not to serve as a muse to some man named Victor. Ah, yes, the difference 74 years can make in the purpose of a woman’s travels.

[3] Taking your own city’s data for a spin is a great way to practice getting comfortable with R visualization if you’re into that sort of thing.

[4] See my adapted R code for SF and Boston here. Again, the vast majority of credit goes to Bradley Boehmke for the original build.

[5] Speaking of seasons

[6] I’d be interested to see which US cities have the largest quantitative difference between “feels like” and actual temperature for each period (say, month) of the year…

[7] From a 2005 Chronicle article: “‘The coldest winter I ever spent was a summer in San Francisco,’ a saying that is almost a San Francisco cliche, turns out to be an invention of unknown origin, the coolest thing Mark Twain never said.”

© Alexandra Albright and The Little Dataset That Could, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts, accompanying visuals, and links may be used, provided that full and clear credit is given to Alex Albright and The Little Dataset That Could with appropriate and specific direction to the original content.