As part of my work at the Washington Center for Equitable Growth, I created a web app to make Equitable Growth (EG) styled charts. At EG we produce reports and policy papers on economic topics that frequently feature several charts each. Because the organization employs only one person proficient in Illustrator, creating these charts was an organizational bottleneck.
An example chart made in Playfair
To overcome this problem, I’ve created a free web app for charting data. This post is a little introduction to the app, which I call Playfair. There’s also a lot of information on the github page. I’ve made a simple tutorial that walks you through a couple Playfair graphs. And you can try Playfair online here (Chrome only right now).
For an arbitrary set of data points, what’s the ‘best’ graph axis for those points? Say that your x-variable runs from 5 to 45. A human can quickly pick a few promising options for the x-axis. It could have ticks at 0, 15, 30, and 45. Or perhaps 0, 10, 20, 30, 40, and 50. Getting a program to generate ‘nice’ options like this is a bit trickier. I’ve been working on a graphing app recently and I’ve reproduced my solution, with notes, below.
Although there are a number of algorithms people have put forth on Stack Overflow and elsewhere, many of these do not handle certain kinds of data sets correctly, and virtually none treat 0 correctly in my opinion. I started from scratch with the following 5 rules for my function:
- If an axis crosses 0, 0 *must* be an axis tick value
- Axis ticks *must* be attached to a grid line (i.e. you can’t have x-axis ticks floating in space – they must be attached to a y-axis grid line as you see in the example above)
- The data should be as tightly contained as possible (little wasted space)
- There should be no fewer than 4 ticks on each axis, and if rule 2 requires it, up to 10
- Numbers should be ‘nice’ (round numbers etc.)
I’ve just released a new version of my NBA shot chart app. Unfortunately, Nylon Calculus is unable to host the app at this time, so you can find it by clicking “shot charts” in the menu bar or by visiting www.austinclemens.com/shotcharts. Rather than read this boring blog post, you should probably just go play with it. If you want more of an introduction however, I have some version notes after the jump.
This is not a project I did for the blog, but I think it’s worthy of a blog post. I made some tutorials on using R and ggplot2 for my colleagues at Nylon Calculus. Partly this was just to give a few of the writers a gentle introduction to R but I also wanted to challenge myself to a completely pointless task: creating a standard format for NC graphs with instructions easy enough for a non-coder to follow. I think I did ok (we’ll see how adoption on NC is). This is such a common problem for amateur bloggers though – how do you create a professional, standardized look? If you’re Vox.com or fivethirtyeight.com, you have people who can create web apps to make graphs, or who can just make graphs for writers, but for amateur bloggers there’s no easy way to solve this problem. I think ggplot is easy enough to use with themes to where this could be a feasible option for amateurs.
Anyways, Here’s a sample graph:
I did some freelance work for ESPN the magazine. You can see it here, I am responsible for the little charts at the bottom of each preview. ESPN’s design department is actually responsible for the look, I just did the analysis and provided them with something more akin to my shot charts, which they then converted, so kudos to them on the nifty design.
This was a fun project but there were a lot of little pitfalls and while most fans won’t care that much, I feel like I have to do a post mortem for the analytics community to explain some of the details.
I hope to have some neat stuff for the blog soon, but in the meantime here’s a little roundup of things I have been doing elsewhere.
I am a contributor at nyloncalculus.com, a new basketball analytics blog, and I have written two pre-season previews for them, on the Mavericks and the Wizards:
Meet the New Wizards, Same as the Old Wizards
Mavericks and Tyson Chandler Look to Make Opponents Work Harder for Points
But I spent most of my time the last couple of months creating shot charts for them. A recent update made these interactive, allowing you to see a player’s accuracy and volume from any point on the floor:
@Cmrn_DP put together some code to make Matplotlib graphs that look like fivethirtyeight.com graphs. I see the attraction–fivethirtyeight graphs have a very simple, attractive look–but I’m not much of a Matplotlib user, so I took a few minutes to try and get the same style in Rs ggplot2 package. Here’s the result:
I rigged up a convenient tool for displaying my adjusted defensive impact graphs. The graphs are drawn as SVGs, and the code is all written to produce graphs that are 750 pixels wide. Unfortunately that has meant that when someone else wants to display a graph (or if I want to use it in a different context), the graphs resize poorly.
So I kluged together a little API of sorts that will draw them at any width you like. Unfortunately I’m still having trouble converting them to PNGs, so you’ll still have to take screen caps of graphs you want to display, but at least it will look right now.
So how do you do it? Just adjust the values in the URL below. The ‘width’ parameter can be changed to anything at all (although it will probably look a bit weird under 500 pixels or so), and just input a player’s full name for the player parameter. Make sure you capitalize the first and last and put %20 between the names. The example below and its associated graph is shown.
@treycausey tweeted about this Uber blog post about how Uber reduced DUI arrests in Seattle. Using a technique called regression discontinuity, they claim that Uber has reduced DUI arrests by about 10% on average. The post bugged me though, because there is not a lot of detail on the methods, and regression discontinuity is the sort of research design that is very much dependent on specification. In this post I replicate the study and walk through what regression discontinuity is and why it can be a very effective research design. Ultimately I think it’s plausible that Uber did in fact reduce DUIs in Seattle, but the story is a bit more complex than the blog post lets on. Continued under the break!
Here are the grisly details about calculating adjusted defensive impact by court location that nobody has been waiting for. I’ve partially explained the methods before here, but some things have changed and that explanation was not comprehensive. So buckle in and get ready for a lot of really dry prose, a couple of kinda cool graphs, and some code that should help you understand the algorithm I’m using to select shots.