I’m finally ready to share a big project I’ve been working on for months. Try selecting a player from the drop down menu below. The visualization that appears shows the defensive impact a player has adjusting for all other defensive players and the offensive player who is actually shooting. Blue squares indicate that when the selected player is on the court, the probability that an offensive player will make his shot declines (so blue is good defense). Red squares indicate that when the selected player is on the court, the probability of any offensive player making his shot rises. Go ahead and take a second to play with the drop down menu. I’m not ready to rollout every player yet, but the Pacers, Bulls, and Grizzlies are all on there. After the jump, I explain the full methodology, and answer some questions that you’ll probably have after looking at the charts.
Adjusted Defensive Impact by Court Location
First, let me explain roughly how I came up with this. It will help if you are familiar with how adjusted plus minus works. Adjusted plus minus uses linear regression to adjust regular plus minus. A single observation in an adjusted plus minus data set is a “shift.” The shift is just a period of play where no player substitutions are made. So a shift could be 10 seconds (and often is late in games) when adjustments are made frequently, or it could be 10 minutes.
For my model, I abandon the shift and instead use a single shot. For every shot, I know all the players on the court. Using this information, I figure out what the FG% of the player taking the shot is from the specific location he is taking the shot. So, if the shot is taken by Dwight Howard and it is right under the basket, the offensive FG% is probably something like 60%. If Dwight Howard takes a shot from 20 feet out however, his FG% is going to be under 40%. I run a regression model that looks like this:
Shot_made = Offensive_FG% + Home + Defensive_P1 + Defensive_P2 + Defensive_P3 + Defensive_P4 + Defensive_P5 + …
Where Shot_made is a variable that indicates whether or not the shot went in (0 if it didn’t, 1 if it did), Offensive_FG% is the shooter’s FG% from the location the shot is being taken at, Home is a variable that indicates if the defense is at home or not, and the Defensive_P* variables are just variables that indicate whether or not a particular player is on the court.1 You may notice I’ve omitted an intercept. Offensive_FG% really ought to be the intercept. Including an intercept would provide for a team effect that is independent of any one player. That might be an appropriate thing to do and it’s something I want to explore in the future. Furthermore, I want the coefficient on Offensive_FG% to be constrained to 1, so I rearrange and the model I actually run is this one (using OLS):
Shot_made – Offensive_FG% = Home + Defensive_P1 + Defensive_P2 + Defensive_P3 + Defensive_P4 + Defensive_P5 + …
This is exactly how adjust plus minus is calculated, just with a different unit of observation. But I don’t just run one regression! Instead, I run 1,750 regressions, one for each 1ft x 1ft area on the court (I don’t go all the way out to mid-court, so it’s a 50ft x 35ft area). In each regression, I include only shots made inside or near that 1ft x 1ft area2 The coefficients returned can be simply interpreted as the % change that a defensive player makes in opposing FG% at that particular location.
And that’s it! Pretty simple. Ok, let’s have a round of questions.
What are the gray boxes?
These are statistically insignificant, but not by the traditional measure (95% confidence). If you’ve ever looked at adjust plus minus with errors, you’ll notice that only a handful of players have statistically significant estimates. Getting significance in these kinds of applications is hard! So I’ve adopted an extremely forgiving definition of significance here: p>0.5. Yes, that’s a 50% confidence interval.3 You can interpret the gray boxes as meaning “this player doesn’t really affect shots from here.” That’s not really accurate (failure to reject the null yada yada) but it’s good enough.4
How come Marc Gasol/Paul George/Joakim Noah doesn’t look that great?
According to most people’s “eye test” and APM or ridge-regression APM approaches, Marc Gasol is one of the best defenders in the NBA. But my chart isn’t the only piece of data that confuses the story. Seth Partnow discusses some other conflicting evidence, for example. How can Marc Gasol be a poor rim protector, not have a lot of blue on his chart, and still be a great defender? I can’t answer this definitively, but the most likely explanation, to my mind, is that Marc Gasol is a great defender because he does things that have nothing to do with challenging shots, like creating turnovers. Remember I’m only looking at shots. I have no idea if someone is creating steals, offensive fouls, or other types of turnovers. And those things are very valuable defensively. Taking away the opportunity to ever shoot is better than forcing a low % shot.
Joakim Noah is another interesting case because the APM numbers and RAPM numbers for him are not that great, so I’m not alone on this one either. My chart certainly doesn’t make him look bad–he seems to be a very good rim protector, but given that he just won DPOY, you’d expect to see more blue.
I don’t know what’s going on with Paul George but it’s worth mentioning that the Pacers’ defense declined pretty badly after the all-star break.
Ok then how come Kosta Koufos looks amazing?
Good question. APM/RAPM say he’s decent but not spectacular. I’d have to look at SportVU data or something similar to really answer this. Maybe Koufos is fantastic at contesting shots but can’t create turnovers to save his life. I really don’t know.
In an earlier post you hypothesized that good rim protection would improve perimeter defense too.
Let me backtrack a bit. One of my first posts on this blog was about Roy Hibbert, and I said that a team with a good rim protector would probably also have great perimeter D. Why? Because perimeter defensive players can play closer to their man when they know that guards who drive are headed straight for Roy Hibbert. I still think that must be true, but you can’t really see it in the case of Hibbert here. The perimeter actually gets hotter. It doesn’t get a lot hotter though, and Taj Gibson is a bit of a counterpoint. Overall though, I have to admit that I don’t know what the full story is here.
Why not use logistic regression?
Or, for that matter, some other classification technique, since this is just a classification problem. The main reason is interpretability. I wanted easy to analyze coefficients. Another reason is if you really think about what the functional form of the logit link implies in this particular case, it seems like an odd fit. I have no reason to think that there’s a nonlinear relationship between a player’s FG% and whether or not they made a basket. That would be weird! That said, I’d like to test other methods and use ROC curves or PRE to try to get a feel for whether or not some are better than others.
Why not adjust for all offensive players?
Offensive players impact plays when they are not the shooter, so this would obviously be the more correct thing to do. Unfortunately, it would also eat up a ton of degrees of freedom. For some of the midrange points, the model is being run with only a hundred or so observations. That means you literally can’t include all offensive players–you would need a couple hundred degrees of freedom for that and I don’t have them.
What does this tell us about how a defensive player affects the location of shots?
Not much, and this is another caveat that is appropriate for the whole Marc Gasol/Paul George/Joakim Noah thing. A good defensive player forces shots into the midrange. Even if those shots go in at better than average rates, that player might still be an effective defender. The size of the boxes kinda helps you see how the distribution is different for each player, but it’s not really a good way to see this. I am working on some ways of getting at this but to be honest I don’t have any ideas that I really like right now.
Why even do this?
This is not your average NBA stat in the sense that it is probably not helpful for predicting much of anything.5 I don’t consider that necessary though. To me, this is a really interesting way to dissect defense and try to explain why a particular player is or isn’t effective (we are learning something about Marc Gasol with these data points, for example–not that he is a bad defender, because we know pretty definitively that he is a very good defender, but rather that he is a different kind of defender).
Some other question
Ask me in comments!
- I run this by team, so if I perform this regression for the Pacers, for example, there will be 8 Defensive_P* variable: one for each Pacers player who has defended at least 1,000 shots. ↩
- What does near mean? It means within 12 feet as long as the shot is of the same type: either close, mid-range, or 3pt. Mid-range is defined as >8 feet from the basket. ↩
- Trust me, this is pretty good for an adjusted stat like this. ↩
- I am playing with other ways to represent this. For example, I could show 3 charts, one with the point estimates, one with the lower bounds, and one with the upper bounds. I was worried that would be too cluttered and confusing for this first attempt so I compromised on gray boxes. ↩
- Although I have some thoughts about this that I will hopefully get to later. ↩