I hope to have some neat stuff for the blog soon, but in the meantime here’s a little roundup of things I have been doing elsewhere.
I am a contributor at nyloncalculus.com, a new basketball analytics blog, and I have written two pre-season previews for them, on the Mavericks and the Wizards:
Meet the New Wizards, Same as the Old Wizards
Mavericks and Tyson Chandler Look to Make Opponents Work Harder for Points
But I spent most of my time the last couple of months creating shot charts for them. A recent update made these interactive, allowing you to see a player’s accuracy and volume from any point on the floor:
I’ve written two articles explaining how these work but I want to fill in a few of the methodological details. First, the articles:
A Better Shot Chart
There are a few little methodological details I have not mentioned, but I get questions about them on Twitter so I want to take a second to address them here.
A lot of people ask about the distance weighting. I realize that there are complicated ways to do this (although I’m not readily familiar with them), but I have primarily concerned myself with creating something that looks good and is reasonably accurate. To that end I am using very simple inverse distance weighting, where the denominator is the square of distance. From a design perspective, the challenge is to have smooth color variation. Ideally, there should not be a deep blue square right next to a bright red one. This can happen if you give too much weight to near points and not enough weight to far points. To get an intuitive handle on this, imagine that you give far shots no weight at all, and positive weight only to shots within the square. That will lead to a lot of variation in color because the 1ft x 1ft squares have few shots in them. If we could have a player shoot 100 shots from a specific spot on the floor, we wouldn’t have to weight, because we would have a good estimate of that player’s skill from that spot. But we don’t, so weighting is essentially a way to create larger, more stable bins of data. I posted an example of an ‘unbinned’ shot chart in this old blog post.
So there is a tradeoff involved in choosing your distance weighting formula. If a large weight is given to more distant points, the colors on the graph will vary smoothly and provide a more accurate sense of the player’s FG% from a spot (since you are roping in more points to provide a more stable/lower error estimate). However, if too large of a weight is given to distant points, then the color of a square will not accurately reflect local FG%. Clearly equally weighting shots within 10 feet would be a poor strategy. This would mix high-efficiency shots right under the basket with low-efficiency shots in the midrange.
I experimented and decided I liked 1/√distance the most. As I said above, I could be doing more sophisticated things, but the questions is what would a more sophisticated formulate buy me? From the end user perspective, there would be virtually no difference.
Another question I used to get, that has been greatly reduced by the addition of interactivity, is about the size of boxes on the chart. First, I want to verify that I am in fact sizing them according to area, not side length, so take that. I’ve sort of glossed over this detail on Nylon Calculus, but the boxes are sized on a log scale. NBA shot volume is massively tilted towards the rim. About 42% of all shots are taken within 8 feet of the rim. The challenge then is that you want to show a player’s midrange shots, which are both a smaller % of his total shot volume and spread out over a larger area, on the same graph where you’re showing rim shots. For a lot of players, this means scaling a box with 100 shots in it and a box with 4 shots in it. I can’t have some boxes 25x the size of others so there’s really no way to get around the log scale. With the new interactive features, you can see the shot volume in an area, so this is much less of a concern that it used to be.
One final note is about placing boxes. I honestly had absolutely no idea what kind of algorithm I should use to place boxes on the chart. I decided early on that the right amount of visual clutter was about 200 boxes, so the shot charts render at most that many (or maybe it is 250 now, I forget!) But how to pick the 200 that are rendered? Initially I just picked the 200 boxes with the most shots in them but I had no tie breaker, and tons of boxes for any given player would be tied at 2 or 3. This could lead to the algorithm arbitrarily picking boxes that are outlier locations for the player. So I needed a tie breaker, and I chose nearby shot volume. The tie breaker right now is number of shots within 2 feet. There are two important consequences of using this as a tie breaker. First, this tends to remove stray boxes that are out on their own and instead give the graphs a nice clustered boxes look. A good thing, in my opinion. Second, it tends to exaggerate a player’s locational tendencies. I have often used Kevin Love as an example of this:
Love vastly prefers the right side to the left. My charts exaggerate this tendency, because when two boxes are tied in volume, one on the right side and one on the left side, the left side one ‘wins’, and Love picks up another box on the left side. Is there a better way to do this? I have no idea. You can’t perfectly show shot volume by drawing a limited number of boxes like this, but I’m certainly curious about ways to show shot volume better.