@treycausey tweeted about this Uber blog post about how Uber reduced DUI arrests in Seattle. Using a technique called regression discontinuity, they claim that Uber has reduced DUI arrests by about 10% on average. The post bugged me though, because there is not a lot of detail on the methods, and regression discontinuity is the sort of research design that is very much dependent on specification. In this post I replicate the study and walk through what regression discontinuity is and why it can be a very effective research design. Ultimately I think it’s plausible that Uber did in fact reduce DUIs in Seattle, but the story is a bit more complex than the blog post lets on. Continued under the break!
There are 14 years of data, starting in 2000, but the first two appear to be incomplete and of course 2013 isn’t over yet, so I restricted my analysis to 2002-2013. I wanted to map the data, not because I had any interesting point to make about the location of federal contracts, but rather because I wanted to get more practice using Kartograph. Unfortunately, the raw data does not provide very good information on the locations of contract winners. Most of the time there is a little bit of location data embedded in a field for the contract winner’s name. But these are often nothing more than zip codes or incomplete addresses. To locate the data on a map, these addresses needed to be geocoded (turning an address into a lat/lon coordinate pair).
I used the Yelp API to get data on BBQ restaurants in each of the 30 largest US cities (according to Wikipedia, city limits only). You’re really not supposed to use the API like this and in the future I’ll use one of Yelp’s academic data sets. To make it work, I had to split each city up into 625 lat/lon grid points and query each one separately. I averaged the ratings of each restaurant I found to get an average for the city. I also sampled a small number of restaurants in each city and collected review text for those restaurants (Yelp really doesn’t like you doing this, since it can’t be done with the API, and I don’t recommend trying it. Let’s just say that I can no longer read Yelp at home.) I used a dictionary of food adjectives I found on the web to pare the corpus down, and found three words for each city that are frequently used to describe that city’s BBQ. As you can see from mousing over the cities on the map below, this didn’t always work out great. More thoughts under the map…
US BBQ according to Yelp reviews
Circle size is proportionate to number of restaurants. Mouse over a city to see adjectives commonly used in Yelp reviews for area restaurants.
|Average restaurant review|