Bad maps and artificial boundaries

An infographic appeared on my Facebook feed recently purporting to be a map of the most-streamed TV shows per state. This is the map:

We may never know what Washington, DC watches.

We may never know what Washington, DC watches.

The map is from a confusingly-named site called HomeSnacks. The site purports to "deliver bite-sized pieces of infotainment about where you live." I'm no fan of infotainment, primarily because it occasionally becomes a substitute for journalism, but also because it lowers the standard of data presentation for everyone. Data sources and methods are often poorly described in peer-reviewed research, so I suppose it isn't surprising to see infotainment sites like HomeSnacks fail to describe their methods at all.

I suspect that the streaming map shown above is specifically a map of Google searches related to popular TV shows. Good luck finding any methodology on HomeSnacks. A link on, er, viral content portal Distractify suggests Google Trends may be involved, though their title also claims the map is Netflix-specific. (The presence of shows like the HBO-specific Game of Thrones indicates otherwise.) Netflix famously doesn't release viewership data and third-party reports only include a handful of shows. Nielsen is supposedly generating some relevant data as well, though the data is not public and likely doesn't track every show available on every service. 

The collection of show titles on the map is suspicious as well. It's very unlikely that the most-watched show in each state would be almost unique to that state. If similar maps are any indication, each state's result is simply the result Googled most often in that state, unless the result has already been assigned to a state. I'm still suspicious of how few oddities there are. 

So, in the absence of an ideal data, there are two possibilities:

  1. The map is a work of fiction
  2. The map is based on state-specific Google searches and filtered to produce the highest value result not seen in any other state (except for Terriers, unless that's just a tie) 

I'd like to produce a better version. It's mostly due to my low tolerance for bad visualizations but also because I'm curious to know what the data really say. Millions of people spend billions of hours with these streaming services so it's a major cultural force (in terms of time spent doing one thing, at least).

In the absence of other data, Google Trends will have to serve as a proxy for viewing. This assumes that a search is equivalent to interest which in turn indicates a desire to view video content.

Here, we're just looking at relative volume of the search term "netflix" across the US since 2004. Maine, Idaho, New Mexico, and Montana all lead in Netflix Interest (July 2018 edit: this and the below maps are embedded rather than static, so the ranking has changed over time). They're also states with low population densities so that may be a factor. Is the same trend true for similar searches?

Maine and Idaho still lead in Hulu searches. The pattern remains similar. Hulu and Netflix both began streaming video in 2007 though Netflix began with DVDs ten years earlier, but perhaps the popularity of streaming video has erased that difference.*

Searches for streaming services probably don't reveal much. Is there any difference among searches for the names of popular TV shows? I arbitrarily chose three popular shows (Game of Thrones, Orange is the New Black, and House of Cards), then retrieved their state Google Trends results. These values are all on a scale of 0 to 100 with the maximum values indicating regions with the highest search incidence for that term. These values are only relative for a specific search term and are not comparable across terms except as an indicator of relative interest. In this case, Game of Thrones has greater overall search volume than the other two show names combined, but relative search interest for some other show may be higher in a particular region. In Arizona, for example, GoT has a value of 77 while Orange is 85.

So, keeping in mind that we're comparing regional interest rather than absolute popularity, here is a map of the highest-value of those three shows across the US:

DC is here, too. No Puerto Rico, though.

DC is here, too. No Puerto Rico, though.

I used the statebins package - it's great for keeping all those tiny New England states visible.

It looks like, in some ways, my map agrees with the HomeSnacks map: House of Cards is popular in and near DC, MA is interested in GoT, and Nevada residents search for Orange is the New Black frequently. The west coast and northeast are consistently GoT-dominated. I should note that this binning approach is artificial - some states, like Delaware, have very similar values for all three shows. If I was an unscrupulous blogger, though, I'd title this figure "The South Loves Prison Shows And The North Loves Dragons".

This is just one map on a clickbait site. You'd be right to assume I'm more concerned about it than I should be. The same site publishes articles about subjects like "the most dangerous cities in a given state" and those articles may be easily misinterpreted as genuine research. Mayors get irritated about that kind of approach, at least.**

I think this might be the real take-home message: be skeptical of any figure without a clear data source and remain skeptical if the source is Google Trends. The line between information and infotainment is easily blurred.


* Remember Qwikster?

** TL;DR: don't misuse crime ranking data.