Methods Post

This is just a catch-all post about methods that I will reference in the future when I post a graph or a regression or whatever. My plan is to update this every time I add something new that I think requires further explanation. So without further ado…

Shot-chart Visualizations
Adjusted defensive impact by court location

Shot-chart Visualizations

From here on out, the way I do the shot-chart visualizations should be fairly stable. There are really only a couple of things that need explanation here. Data is usually current as of the date of the blog post but does not update automatically, so backdate appropriately. All shots taken against a team or by a player or whatever it is are grouped into 1ft x 1ft squares that cover the court. 2 and 3 point shots are not mixed in this process. Basically if a square’s center is inside the arc, it should contain only 2pt shots, and if the square’s center is outside the arc it should contain only 3pt shots.

Coloring each square is a little trickier. Rather than use set regions, each square is colored according to FG% inside the square and FG% near the square in a circular area. The goal is to base FG% on at least 8% of the data. For each square that is displayed, shots close to that square are grouped until either 8% of all shots are being used or a 7 foot circular area around the square is reached. Most midrange squares will use this 7-foot range fully and still end up with only 50-100 shots to base the FG% on. Interior squares, however, may not need to draw from the neighboring area at all, or may only need to draw from a very small surrounding area. The graph below illustrates.

Opposing FG% compared to league average, Washington Wizards

Circles are drawn for each square in the graph to show the area that square is using to calculate a FG%. The circles are inaccurate in the sense that 3pt squares never try to draw from inside the arc, so those should really look like circles that end abruptly at the arc. Likewise, 2pt squares never try to draw from outside the arc.

In comparison charts like the one above, where the FG% is being compared to the league average, the reddest squares indicate that the FG% is at least 7% above league average and the bluest indicate that the FG% is 7 below league average. Each shade is a 1% increment. For charts that are not comparisons, FG% ranges from 30% to over 60% in 2% increments.

The number on each square is the points per shot for that square, which is just FG% in the square x point value of shots in the square.

Adjusted defensive impact by court location

I originally laid this all out here and here. Adjusted defensive impact by court location is an attempt to ascertain a player’s defensive impact on shots taken all over the court, controlling for the other defensive players and the offensive player’s FG% from that particular court location. Like adjusted plus minus, it uses regression to adjust game events.

The unit of observation is a single shot. For every shot taken in the NBA season, I know all the players on the court. Using this information, I figure out the FG% of the player taking the shot from the specific location he is taking the shot. So, if the shot is taken by Dwight Howard and it is right under the basket, the offensive FG% is probably something like 60%. If Dwight Howard takes a shot from 20 feet out however, his FG% is going to be under 40%. I run a regression model that looks like this:

Shot_made = Offensive_FG% + Home + Defensive_P1 + Defensive_P2 + Defensive_P3 + Defensive_P4 + Defensive_P5 + …

Where Shot_made is a variable that indicates whether or not the shot went in (0 if it didn’t, 1 if it did), Offensive_FG% is the shooter’s FG% from the location the shot is being taken at, Home is a variable that indicates if the defense is at home or not, and the Defensive_P* variables are just variables that indicate whether or not a particular player is on the court.1 You may notice I’ve omitted an intercept. Offensive_FG% really ought to be the intercept. Including an intercept would provide for a team effect that is independent of any one player. That might be an appropriate thing to do and it’s something I want to explore in the future. Furthermore, I want the coefficient on Offensive_FG% to be constrained to 1, so I rearrange and the model I actually run is this one (using OLS):

Shot_made – Offensive_FG% = Home + Defensive_P1 + Defensive_P2 + Defensive_P3 + Defensive_P4 + Defensive_P5 + …

This is exactly how adjusted plus minus is calculated, just with a different unit of observation. But I don’t just run one regression! Instead, I run 1,750 regressions, one for each 1ft x 1ft area on the court (I don’t go all the way out to mid-court, so it’s a 50ft x 35ft area). In each regression, I include only shots made inside or near that 1ft x 1ft area2 The coefficients returned can be simply interpreted as the % change that a defensive player makes in opposing FG% at that particular location.

The next step is plotting these coefficients on the court. Plotting is done in 1ft x 1ft squares. First, I find the 250 squares for each player where shot volume defended is highest for that particular player. These are the 250 locations I will plot (more than this is visually confusing). For each location, the square is colored according to the coefficient, with blue squares indicating a negative coefficient. These can be simply interpreted: if a player’s coefficient from a location is -0.05, it means that when that player is defending and a shot is taken from that location, the offensive player is 5% less likely to make the shot than he would be on average. By contrast, a positive coefficient means that the offensive player is more likely to make that shot than he otherwise would be. So blue squares indicate good defense, red squares bad defense.

Squares are sized according to the number of shots taken in that square. Shot volumes are first logged and then scaled between a minimum box size (currently 8 pixels) and a maximum box size (currently 25 pixels).

  1. I run this by team, so if I perform this regression for the Pacers, for example, there will be 8 Defensive_P* variable: one for each Pacers player who has defended at least 1,000 shots.
  2. What does near mean? It means within 12 feet as long as the shot is of the same type: either close, mid-range, or 3pt. Mid-range is defined as >8 feet from the basket.
Share on FacebookTweet about this on TwitterShare on LinkedIn