Smoothing the data and the problem with on/off court analytics

In keeping with the subject of my last post, I’ve slapped together a partial fix before I start working on a much bigger change in this whole endeavor. One way to deal with binning problems is just to smooth the data somehow. Here’s a picture of a histogram, for example, with both binned data and then a kernel density plot.

histogram

In this spirit, I decided to ditch the rigid regions I’ve used in other graphs altogether. Instead, to get a FG% for each box on the graph, I find all shots near that square until I’ve found 8% of the total data, and calculate a FG% using that shot. Basically there’s a larger circle around almost every square (not every square, because some have 8% of the data in them already) that’s used to determine FG%. Squares are only drawn at all if they reach a certain threshhold % of shot volume so for the most part these circles are not large, and they only draw on shots of the same type, so a 2-pt square does not look at any 3-pt shots.

Here’s those graphs for Marc Gasol, which I want to use to illustrate another point:

Opposing FG%, Marc Gasol on court

Opposing FG%, Marc Gasol off court

First, you can see that there are more subtle color gradations within regions now. This effect is most noticeable at the borders of my previous regions. Formerly, shots on these borders were fairly distant from the shots they were being grouped with to determine FG%. Now, every square is the center of the shots being used to color it.

So why Marc Gasol? I wanted to demonstrate something really obvious, which is that on/off court metrics have problems. You can see that the graphs suggest that when Gasol is in the game, the Grizzlies’ 3-pt defense declines dramatically. Is that Gasol’s fault? There are certainly ways this could happen, but you may also know that Gasol was out for the beginning of the season, and his return coincides almost perfectly with Tony Allen getting injured. The two of them have spent very little time together on the floor and it just so happens that Tony Allen is an excellent perimeter defender. So the graphs are very misleading on this point!

How do you ‘fix’ this? This is a known issue with +/- statistics, which have been in use for some time. The solution that has been adopted to create the adjusted +/- stat involves using a regression formula to try to control for the offensive and defensive players that share the court with a player. I’ll be working on something similar next, which will let me consolidate two graphs into one and show you how a player impacts opposing FG% in different areas of the court holding the quality of other players on the court equal.

Share on FacebookTweet about this on TwitterShare on LinkedIn