A new way to think about NBA player defense

I’m finally ready to share a big project I’ve been working on for months. Try selecting a player from the drop down menu below. The visualization that appears shows the defensive impact a player has adjusting for all other defensive players and the offensive player who is actually shooting. Blue squares indicate that when the selected player is on the court, the probability that an offensive player will make his shot declines (so blue is good defense). Red squares indicate that when the selected player is on the court, the probability of any offensive player making his shot rises. Go ahead and take a second to play with the drop down menu. I’m not ready to rollout every player yet, but the Pacers, Bulls, and Grizzlies are all on there. After the jump, I explain the full methodology, and answer some questions that you’ll probably have after looking at the charts.

Adjusted Defensive Impact by Court Location

First, let me explain roughly how I came up with this. It will help if you are familiar with how adjusted plus minus works. Adjusted plus minus uses linear regression to adjust regular plus minus. A single observation in an adjusted plus minus data set is a “shift.” The shift is just a period of play where no player substitutions are made. So a shift could be 10 seconds (and often is late in games) when adjustments are made frequently, or it could be 10 minutes.

For my model, I abandon the shift and instead use a single shot. For every shot, I know all the players on the court. Using this information, I figure out what the FG% of the player taking the shot is from the specific location he is taking the shot. So, if the shot is taken by Dwight Howard and it is right under the basket, the offensive FG% is probably something like 60%. If Dwight Howard takes a shot from 20 feet out however, his FG% is going to be under 40%. I run a regression model that looks like this:

Shot_made = Offensive_FG% + Home + Defensive_P1 + Defensive_P2 + Defensive_P3 + Defensive_P4 + Defensive_P5 + …

Where Shot_made is a variable that indicates whether or not the shot went in (0 if it didn’t, 1 if it did), Offensive_FG% is the shooter’s FG% from the location the shot is being taken at, Home is a variable that indicates if the defense is at home or not, and the Defensive_P* variables are just variables that indicate whether or not a particular player is on the court.1 You may notice I’ve omitted an intercept. Offensive_FG% really ought to be the intercept. Including an intercept would provide for a team effect that is independent of any one player. That might be an appropriate thing to do and it’s something I want to explore in the future. Furthermore, I want the coefficient on Offensive_FG% to be constrained to 1, so I rearrange and the model I actually run is this one (using OLS):

Shot_made – Offensive_FG% = Home + Defensive_P1 + Defensive_P2 + Defensive_P3 + Defensive_P4 + Defensive_P5 + …

This is exactly how adjust plus minus is calculated, just with a different unit of observation. But I don’t just run one regression! Instead, I run 1,750 regressions, one for each 1ft x 1ft area on the court (I don’t go all the way out to mid-court, so it’s a 50ft x 35ft area). In each regression, I include only shots made inside or near that 1ft x 1ft area2 The coefficients returned can be simply interpreted as the % change that a defensive player makes in opposing FG% at that particular location.

And that’s it! Pretty simple. Ok, let’s have a round of questions.

What are the gray boxes?
These are statistically insignificant, but not by the traditional measure (95% confidence). If you’ve ever looked at adjust plus minus with errors, you’ll notice that only a handful of players have statistically significant estimates. Getting significance in these kinds of applications is hard! So I’ve adopted an extremely forgiving definition of significance here: p>0.5. Yes, that’s a 50% confidence interval.3 You can interpret the gray boxes as meaning “this player doesn’t really affect shots from here.” That’s not really accurate (failure to reject the null yada yada) but it’s good enough.4

How come Marc Gasol/Paul George/Joakim Noah doesn’t look that great?
According to most people’s “eye test” and APM or ridge-regression APM approaches, Marc Gasol is one of the best defenders in the NBA. But my chart isn’t the only piece of data that confuses the story. Seth Partnow discusses some other conflicting evidence, for example. How can Marc Gasol be a poor rim protector, not have a lot of blue on his chart, and still be a great defender? I can’t answer this definitively, but the most likely explanation, to my mind, is that Marc Gasol is a great defender because he does things that have nothing to do with challenging shots, like creating turnovers. Remember I’m only looking at shots. I have no idea if someone is creating steals, offensive fouls, or other types of turnovers. And those things are very valuable defensively. Taking away the opportunity to ever shoot is better than forcing a low % shot.

Joakim Noah is another interesting case because the APM numbers and RAPM numbers for him are not that great, so I’m not alone on this one either. My chart certainly doesn’t make him look bad–he seems to be a very good rim protector, but given that he just won DPOY, you’d expect to see more blue.

I don’t know what’s going on with Paul George but it’s worth mentioning that the Pacers’ defense declined pretty badly after the all-star break.

Ok then how come Kosta Koufos looks amazing?
Good question. APM/RAPM say he’s decent but not spectacular. I’d have to look at SportVU data or something similar to really answer this. Maybe Koufos is fantastic at contesting shots but can’t create turnovers to save his life. I really don’t know.

In an earlier post you hypothesized that good rim protection would improve perimeter defense too.
Let me backtrack a bit. One of my first posts on this blog was about Roy Hibbert, and I said that a team with a good rim protector would probably also have great perimeter D. Why? Because perimeter defensive players can play closer to their man when they know that guards who drive are headed straight for Roy Hibbert. I still think that must be true, but you can’t really see it in the case of Hibbert here. The perimeter actually gets hotter. It doesn’t get a lot hotter though, and Taj Gibson is a bit of a counterpoint. Overall though, I have to admit that I don’t know what the full story is here.

Why not use logistic regression?
Or, for that matter, some other classification technique, since this is just a classification problem. The main reason is interpretability. I wanted easy to analyze coefficients. Another reason is if you really think about what the functional form of the logit link implies in this particular case, it seems like an odd fit. I have no reason to think that there’s a nonlinear relationship between a player’s FG% and whether or not they made a basket. That would be weird! That said, I’d like to test other methods and use ROC curves or PRE to try to get a feel for whether or not some are better than others.

Why not adjust for all offensive players?
Offensive players impact plays when they are not the shooter, so this would obviously be the more correct thing to do. Unfortunately, it would also eat up a ton of degrees of freedom. For some of the midrange points, the model is being run with only a hundred or so observations. That means you literally can’t include all offensive players–you would need a couple hundred degrees of freedom for that and I don’t have them.

What does this tell us about how a defensive player affects the location of shots?
Not much, and this is another caveat that is appropriate for the whole Marc Gasol/Paul George/Joakim Noah thing. A good defensive player forces shots into the midrange. Even if those shots go in at better than average rates, that player might still be an effective defender. The size of the boxes kinda helps you see how the distribution is different for each player, but it’s not really a good way to see this. I am working on some ways of getting at this but to be honest I don’t have any ideas that I really like right now.

Why even do this?
This is not your average NBA stat in the sense that it is probably not helpful for predicting much of anything.5 I don’t consider that necessary though. To me, this is a really interesting way to dissect defense and try to explain why a particular player is or isn’t effective (we are learning something about Marc Gasol with these data points, for example–not that he is a bad defender, because we know pretty definitively that he is a very good defender, but rather that he is a different kind of defender).

Some other question
Ask me in comments!

  1. I run this by team, so if I perform this regression for the Pacers, for example, there will be 8 Defensive_P* variable: one for each Pacers player who has defended at least 1,000 shots.
  2. What does near mean? It means within 12 feet as long as the shot is of the same type: either close, mid-range, or 3pt. Mid-range is defined as >8 feet from the basket.
  3. Trust me, this is pretty good for an adjusted stat like this.
  4. I am playing with other ways to represent this. For example, I could show 3 charts, one with the point estimates, one with the lower bounds, and one with the upper bounds. I was worried that would be too cluttered and confusing for this first attempt so I compromised on gray boxes.
  5. Although I have some thoughts about this that I will hopefully get to later.
Share on FacebookTweet about this on TwitterShare on LinkedIn

7 thoughts on “A new way to think about NBA player defense

  1. yo nice work kid, i do much of the same in my spare time.

    one major gripe, and it’s not about your work, but rather your analysis which follows

    you ask “How come Paul George doesn’t look that great?”
    ummm… he DOES look that great. Look at Mike Dunleavy (considered to be a below-average defender in a top-notch defensive system). Observe the frequency and size of yellow-to-red squares. yup, makes sense. Compare to Paul George. George has far fewer yellow-to-red squares, and they are much smaller. Dunleavy has more blue squares, but that in itself is telling.
    Opponents just get up fewer shots against George. He does a better job of denying his man the ball, and forcing him to give it up. Those are the marks of a stellar defender. (you kind of touch on this with Gasol)
    Your pictures are super cool, and provide a lot of insight. However you can’t just make random statements like “it’s worth mentioning that the Pacers’ defense declined pretty badly after the all-star break.” Is it worth mentioning? Are you suggesting that George may have only appeared to be a good defender in a small early season sample and then regressed to a mediocre defender? Anyone familiar with “Defensive Win Shares” may have a word or two for you in that regard…

  2. Paul George’s numbers have declined a bit though, if you look at his defensive RAPM for example. Clearly that’s not the whole story, because he’s still a very good defender on the season and as you mention Indiana is fantastic at forcing poor shots. Thanks for reading!

  3. Can you do this for shot location frequency as well? I imagine it’s similar data, and getting opponent players to take a bunch of shots from mid range, (even if they shoot a little better than average from there) is a positive defensively.

    • This is at the top of my todo list. It may not be as specific as this – right now I’m thinking I might just do how each player affects 3pt vs midrange vs close rather than locations all over the court, but I’m hoping I can work it into my post later this week where I expand to all NBA players.

  4. For players such as Roy Hibbert and Marc Gasol it might be that average number of shots taken inside the paint while they are in the game is far less, for instance if there are only 10 shots from a specific spot taken while gasol is in the game, but most of those come of breaks and wide open layups a player might shoot a higher percentage. Maybe if say koufos is in the game opposing teams would attack the rim more and get 15-20 shots from the same spot, he is still a seven footer so those 5-10 contested shots are a lower percentage than wide open shots they are probably a lower percentage then long jumpers. This might be part of the reason great interior defenders look worse inside is that just a fewer percentage of shots are taken while they are in the game from the high quality areas. (around the basket and the corners)

  5. Pingback: Further thoughts and clarifications on adjusted defensive impact | Austin Clemens

  6. Pretty cool.
    A couple of questions:
    – are you controlling for who is on offense? (you should)
    – are you using OLS or ridge regression (OLS is awful here, for sure)

Comments are closed.