Lazy Sunday, time to post some more NBA defenses. The first game today is OKC v. LAC, two very good defenses and an opportunity to see how good defenses actually operate very differently. First up is the Clippers, who are not a particularly good defensive team in the paint but are currently leading the league in 3pt defense, where they absolutely shut down the opposition. As a refresher, the numbers are PPS from the square they are on. The Clippers defend the arc so well that PPS from behind the arc and from midrange is almost a wash.
This is just a catch-all post about methods that I will reference in the future when I post a graph or a regression or whatever. My plan is to update this every time I add something new that I think requires further explanation. So without further ado…
From here on out, the way I do the shot-chart visualizations should be fairly stable. There are really only a couple of things that need explanation here. Data is usually current as of the date of the blog post but does not update automatically, so backdate appropriately. All shots taken against a team or by a player or whatever it is are grouped into 1ft x 1ft squares that cover the court. 2 and 3 point shots are not mixed in this process. Basically if a square’s center is inside the arc, it should contain only 2pt shots, and if the square’s center is outside the arc it should contain only 3pt shots.
The 76ers traded Spencer Hawes to the Cavaliers for two second round draft picks this weeks. The trade caused a bit of a kerfluffle on my twitter feed, as several people seemed to think that the Cavaliers fleeced the 76ers because 2nd round picks have little value. Is that true? Before I get into the analysis, I want to caveat by saying that I think this is a great trade for the 76ers regardless. Hawes is about to become a free agent and has no future in Philly, so getting 2 picks for him is basically getting something for nothing (there was some filler but the Sixers didn’t take back anything damaging). I doubt they could have done better, because first round picks are very highly valued right now (maybe overvalued–I’ll try to explore this in a future post).
I collected data on every draft pick made between 2000 and 2009. For each draft pick, I found the maximum Player Efficiency Rating (PER) that player obtained in the NBA. For the rest of the post, that’s the number I’ll be working with: the maximum PER a draft pick has obtained for a season with more than 200 minutes. This is a very generous measure to use, because some players manage to attain a high PER but can’t maintain that level of production over more than one year. Hawes routinely has a PER above 13 and has hit 18 once, so using maximum PER is biasing my results in favor of the 2nd round draft picks and against Hawes.
First, here’s a graph that shows max PER vs draft pick number. PERs have been jittered1 so that you can visually see how many points there are at 0. A PER of 0 usually indicates that the player never played in the NBA or never had a season with more than 200 minutes.
- This means that a small value has been randomly added or subtracted from the points. ↑
Face it: Greg Monroe is not going to be dealt. Kyle Lowry is not going to be dealt. Pau Gasol is not going to be dealt. We’re all going to wake up on Friday and ask ourselves, “wait a second, wasn’t yesterday the trade deadline?” Here’s a look at some NBA defenses to ease the pain.
How about them Pacers? The boxes here are colored according to the league average, so blue indicates that opponents shoot worse than the league average when they face the Pacers, while warm colors indicate that opponents shoot better than the league average. No big surprises here–the Pacers have a dominant defense. But something interesting jumped out at me and I’ve flagged it by labeling the PPS (points per shot) of high volume locations. The PPS for some midrange shots is actually higher than the PPS for some 3-pt and rim shots! That’s just crazy. Generally speaking, mid-range shots are a poor value compared to 3s and shots at the rim. In a version of this graph that used 4-week old data, there were even more mid-range locations that paid off, but it looks Indiana has even gotten a little better since then. Just brutal.
Opposing FG% compared to league average, Indiana Pacers
More graphs below the jump!
This is a quick1 little demonstration I made for my POLS 206 class to demonstrate how single member districts in the House of Representatives can cause weird stuff to happen. The map shows the composition of the House delegation from each state. House delegations from red states are mostly Republican, those from blue states are mostly Democratic, and those from purple states are split in some way.
What I want to demonstrate to my class is the difference between the composition of a state’s House delegation and the popular vote for members of the House in that state. In Maine, for example, two out of two Representatives are Democrats, but 38% of the state’s voters voted for a Republican representative. If you believe that a ‘fair’ House delegation is one in which the number of Rs and Ds reflect the split between Rs and Ds in the state’s voters, then Maine should have 1 Republican rep and 1 Democratic rep (0.38*2=0.76, rounds to 1 Republican). When you mouse over a state, the state will change color to reflect the House delegation split that would most accurately mirror the popular vote split. Both these numbers are shown in the upper right corner of the map.
Partisan Composition of State House Delegations
- It should have taken like an hour but CSS. ↑
In keeping with the subject of my last post, I’ve slapped together a partial fix before I start working on a much bigger change in this whole endeavor. One way to deal with binning problems is just to smooth the data somehow. Here’s a picture of a histogram, for example, with both binned data and then a kernel density plot.
I am still tweaking the graphs I’ve already shown and working on some new things1 but I want to post something about the decisions I made in this process. The biggest puzzle in making these kinds of visuals has to do with binning. Binning data is taking data and sorting it into discrete bins to make it easier to interpret. Here’s a shooting graph where the data has been binned very very little (into 1ft x 1ft squares):
- Graphs of the on-court/off-court difference players make, adjusting or normalizing graphs to account for teammates and opponents, graphs focused on offense rather than defense, and other things. ↑
A friend noted that in the graphs in my previous post, it is difficult to tell how shot volume changes with Hibbert on and off court. My initial goal was to have the same amount of paint on the on and off court graphs, so the volume of colored area could be compared directly. That would let us see, for example, if opposing teams take more shots in the paint when Hibbert sits.
Unfortunately, I can’t figure out a way to do this. The problem is that so much of the shot volume is near the basket (about 40% within 8 feet). This makes it difficult to represent shot volume proportionally, as the squares near the basket would have to be enormous. In the graphs on the previous post, I had to use a log scale to make the graphs visually interpretable.
Another possibility, however, is some kind of heat map. The heat maps below show shot locations of the opposing team with Hibbert on and off the court.
Hibbert on court
NBA observers are always talking about how some player makes everyone around him better. This sports cliche is almost always used in basketball to talk about point guards or a combo/wing player with good court vision in the mold of Kobe or LeBron. The ‘makes his teammates better’ meme is actually particularly apt for describing basketball. Sure, a quarterback and a receiver need each other, and a quarterback needs his offensive line. And yeah, one bad fielder can ruin a good double play. But teammates in these sports are not as reliant on each other as the five players on a basketball court are.
There are 14 years of data, starting in 2000, but the first two appear to be incomplete and of course 2013 isn’t over yet, so I restricted my analysis to 2002-2013. I wanted to map the data, not because I had any interesting point to make about the location of federal contracts, but rather because I wanted to get more practice using Kartograph. Unfortunately, the raw data does not provide very good information on the locations of contract winners. Most of the time there is a little bit of location data embedded in a field for the contract winner’s name. But these are often nothing more than zip codes or incomplete addresses. To locate the data on a map, these addresses needed to be geocoded (turning an address into a lat/lon coordinate pair).