For an arbitrary set of data points, what’s the ‘best’ graph axis for those points? Say that your x-variable runs from 5 to 45. A human can quickly pick a few promising options for the x-axis. It could have ticks at 0, 15, 30, and 45. Or perhaps 0, 10, 20, 30, 40, and 50. Getting a program to generate ‘nice’ options like this is a bit trickier. I’ve been working on a graphing app recently and I’ve reproduced my solution, with notes, below.
Although there are a number of algorithms people have put forth on Stack Overflow and elsewhere, many of these do not handle certain kinds of data sets correctly, and virtually none treat 0 correctly in my opinion. I started from scratch with the following 5 rules for my function:
- If an axis crosses 0, 0 *must* be an axis tick value
- Axis ticks *must* be attached to a grid line (i.e. you can’t have x-axis ticks floating in space – they must be attached to a y-axis grid line as you see in the example above)
- The data should be as tightly contained as possible (little wasted space)
- There should be no fewer than 4 ticks on each axis, and if rule 2 requires it, up to 10
- Numbers should be ‘nice’ (round numbers etc.)
I’ve just released a new version of my NBA shot chart app. Unfortunately, Nylon Calculus is unable to host the app at this time, so you can find it by clicking “shot charts” in the menu bar or by visiting www.austinclemens.com/shotcharts. Rather than read this boring blog post, you should probably just go play with it. If you want more of an introduction however, I have some version notes after the jump.
This is not a project I did for the blog, but I think it’s worthy of a blog post. I made some tutorials on using R and ggplot2 for my colleagues at Nylon Calculus. Partly this was just to give a few of the writers a gentle introduction to R but I also wanted to challenge myself to a completely pointless task: creating a standard format for NC graphs with instructions easy enough for a non-coder to follow. I think I did ok (we’ll see how adoption on NC is). This is such a common problem for amateur bloggers though – how do you create a professional, standardized look? If you’re Vox.com or fivethirtyeight.com, you have people who can create web apps to make graphs, or who can just make graphs for writers, but for amateur bloggers there’s no easy way to solve this problem. I think ggplot is easy enough to use with themes to where this could be a feasible option for amateurs.
Anyways, Here’s a sample graph:
I did some freelance work for ESPN the magazine. You can see it here, I am responsible for the little charts at the bottom of each preview. ESPN’s design department is actually responsible for the look, I just did the analysis and provided them with something more akin to my shot charts, which they then converted, so kudos to them on the nifty design.
This was a fun project but there were a lot of little pitfalls and while most fans won’t care that much, I feel like I have to do a post mortem for the analytics community to explain some of the details.
@Cmrn_DP put together some code to make Matplotlib graphs that look like fivethirtyeight.com graphs. I see the attraction–fivethirtyeight graphs have a very simple, attractive look–but I’m not much of a Matplotlib user, so I took a few minutes to try and get the same style in Rs ggplot2 package. Here’s the result:
I rigged up a convenient tool for displaying my adjusted defensive impact graphs. The graphs are drawn as SVGs, and the code is all written to produce graphs that are 750 pixels wide. Unfortunately that has meant that when someone else wants to display a graph (or if I want to use it in a different context), the graphs resize poorly.
So I kluged together a little API of sorts that will draw them at any width you like. Unfortunately I’m still having trouble converting them to PNGs, so you’ll still have to take screen caps of graphs you want to display, but at least it will look right now.
So how do you do it? Just adjust the values in the URL below. The ‘width’ parameter can be changed to anything at all (although it will probably look a bit weird under 500 pixels or so), and just input a player’s full name for the player parameter. Make sure you capitalize the first and last and put %20 between the names. The example below and its associated graph is shown.
@treycausey tweeted about this Uber blog post about how Uber reduced DUI arrests in Seattle. Using a technique called regression discontinuity, they claim that Uber has reduced DUI arrests by about 10% on average. The post bugged me though, because there is not a lot of detail on the methods, and regression discontinuity is the sort of research design that is very much dependent on specification. In this post I replicate the study and walk through what regression discontinuity is and why it can be a very effective research design. Ultimately I think it’s plausible that Uber did in fact reduce DUIs in Seattle, but the story is a bit more complex than the blog post lets on. Continued under the break!
Here are the grisly details about calculating adjusted defensive impact by court location that nobody has been waiting for. I’ve partially explained the methods before here, but some things have changed and that explanation was not comprehensive. So buckle in and get ready for a lot of really dry prose, a couple of kinda cool graphs, and some code that should help you understand the algorithm I’m using to select shots.
This is a really great article from Miles Wray about Spoelstra’s decision to put Rashard Lewis on David West in game 4 of the Eastern Conference Finals. I want to add a little footnote to it. Miles’ main point is that Rashard Lewis is pretty fast for a 6’10 guy and that allows him not only to play West aggressively, but to help out on other players and then recover quickly and get back to West. This leads to more turnovers.
This turns out to be an interesting case for my Adjusted Defensive Impact visualization too though. The construction of ADI is similar to Adjusted Plus Minus approaches, but this is a case where APM and its various flavors don’t tell the full story. If you take a look at Jeremias Engelmann’s xRapm stats, you can see that Rashard Lewis is a slight negative on defense. But take a look at his adjusted defensive impact viz:
Lewis isn’t a great defender overall, but he’s a pretty decent defender near the basket. More than that, he’s especially good on the right side close to the basket and in the midrange. Guess who loves those spots? Here’s David West’s season shot chart from NBA.com:
This is not really the story of last night–West actually shot well when Lewis was in the game (7 for 12) and I think Wray’s assessment is pretty accurate: Lewis was valuable not because he shut down West (who had a good game overall) but because he forced turnovers. But it means that Lewis can do the things he does well, per Wray’s analysis, without being a defensive liability in his matchup with west.
The Birdman is out for tonight’s Heat v Pacers game and that could make a big difference. Miami has a very strong defensive front court. They start Bosh, who is a very canny defensive player, but there is no drop off when Bosh sits, because Birdman is an excellent defensive center. Check out the numbers from my adjusted defensive impact graph for Andersen. Warm colors (red/orange) indicate that offensive players shoot better when Birdman is on the court. Blue colors indicate that offensive players suffer when Birdman is on the court and Birdman brings the blue. See other players here.
The graph shows how Birdman affects opposing FG% after adjusting for other defenders and the other offensive players on the court. Don’t take my word for it though, he also puts up amazing defensive xrapm stats.