#makeovermonday: Data breaches from informationisbeautiful.net

I’m going to try to do something quick for Andy Kriebel and Andy Cotgreave’s #makeovermonday every week so I can continue showcasing some of the different types of graphs you can make in my graphing web app Playfair and so I can identify and quash bugs! This week they chose an interactive from David McCandless’s informationisbeautiful.net showing the number of records leaked in various data breaches between 2004 and 2016. Andy gives a nice run-down of the pros and cons of the original graphic.

I made the mistake of looking at a few early entries and a major theme that seems to have struck several people is the division between data breeches that were the result of hacks and those that weren’t (including user error and lost equipment). A quick look at the data shows that the former are increasing rapidly. My first thought was that an area chart would show this trend nicely, but then I remembered that I recently implemented a variation on area charts that might be neat here. I’m actually not sure what you call this kind of chart, but it’s simply an area chart with two categories where both areas originate from y=0. The area for one category is above the x-axis and the area for the other is below it. Here’s my entry for this data set:


Continue reading

New Stuff Roundup: shot charts and a few bball articles

I hope to have some neat stuff for the blog soon, but in the meantime here’s a little roundup of things I have been doing elsewhere.

I am a contributor at nyloncalculus.com, a new basketball analytics blog, and I have written two pre-season previews for them, on the Mavericks and the Wizards:
Meet the New Wizards, Same as the Old Wizards
Mavericks and Tyson Chandler Look to Make Opponents Work Harder for Points

But I spent most of my time the last couple of months creating shot charts for them. A recent update made these interactive, allowing you to see a player’s accuracy and volume from any point on the floor:


Continue reading

‘Fair’ Districts in the US House, or the lack thereof

This is a quick1 little demonstration I made for my POLS 206 class to demonstrate how single member districts in the House of Representatives can cause weird stuff to happen. The map shows the composition of the House delegation from each state. House delegations from red states are mostly Republican, those from blue states are mostly Democratic, and those from purple states are split in some way.

What I want to demonstrate to my class is the difference between the composition of a state’s House delegation and the popular vote for members of the House in that state. In Maine, for example, two out of two Representatives are Democrats, but 38% of the state’s voters voted for a Republican representative. If you believe that a ‘fair’ House delegation is one in which the number of Rs and Ds reflect the split between Rs and Ds in the state’s voters, then Maine should have 1 Republican rep and 1 Democratic rep (0.38*2=0.76, rounds to 1 Republican). When you mouse over a state, the state will change color to reflect the House delegation split that would most accurately mirror the popular vote split. Both these numbers are shown in the upper right corner of the map.

Partisan Composition of State House Delegations

Continue reading

  1. It should have taken like an hour but CSS.

Mapping BBQ locations using Yelp

A little while back a friend suggested that maybe we could write a fun piece about BBQ. We are both living in Texas and while BBQ has always been a big deal in Texas, it seems to be getting bigger, with Texas Monthly appointing a full-time BBQ editor. I’ve been wanting to learn to work with maps better and to get into interactive web visualizations for a while, so this seemed like a good excuse to work on both.

I used the Yelp API to get data on BBQ restaurants in each of the 30 largest US cities (according to Wikipedia, city limits only). You’re really not supposed to use the API like this and in the future I’ll use one of Yelp’s academic data sets. To make it work, I had to split each city up into 625 lat/lon grid points and query each one separately. I averaged the ratings of each restaurant I found to get an average for the city. I also sampled a small number of restaurants in each city and collected review text for those restaurants (Yelp really doesn’t like you doing this, since it can’t be done with the API, and I don’t recommend trying it. Let’s just say that I can no longer read Yelp at home.) I used a dictionary of food adjectives I found on the web to pare the corpus down, and found three words for each city that are frequently used to describe that city’s BBQ. As you can see from mousing over the cities on the map below, this didn’t always work out great. More thoughts under the map…

US BBQ according to Yelp reviews

Circle size is proportionate to number of restaurants. Mouse over a city to see adjectives commonly used in Yelp reviews for area restaurants.

Average restaurant review
>3.7 >3.6 >3.5 >3.4 >3.3 >3.2 <3.1

Continue reading