Poking around federal contracts

The Sunlight Foundation recently posted a database of federal contracts they obtained via FOIA (and litigation, apparently). I thought this would be another fun opportunity to work on my javascript skills and poke around a new data source. I’ve never really looked at federal contracts even though there is a lot of data that is readily accessible online.

There are 14 years of data, starting in 2000, but the first two appear to be incomplete and of course 2013 isn’t over yet, so I restricted my analysis to 2002-2013. I wanted to map the data, not because I had any interesting point to make about the location of federal contracts, but rather because I wanted to get more practice using Kartograph. Unfortunately, the raw data does not provide very good information on the locations of contract winners. Most of the time there is a little bit of location data embedded in a field for the contract winner’s name. But these are often nothing more than zip codes or incomplete addresses. To locate the data on a map, these addresses needed to be geocoded (turning an address into a lat/lon coordinate pair).

My first thought was to do this using the Google maps API. Unfortunately, Google’s API only allows an IP address to make 2,500 requests a day before the IP is shut off. Geocoding 500k addresses this way would take a bit longer than I cared to spend. Luckily, the geocoding provided by MapQuest is basically unlimited. This is pretty rare for APIs and all they ask for in exchange is some credit so: Geocoding Courtesy of MapQuest . Thanks MapQuest!

Whenever possible I tried to geocode by zip, using a regular expression to capture either the 5 digit or 9 digit zip code. MapQuest doesn’t seem to handle long complicated addresses as well as Google does (like 14th and Washington street, Philadelphia PA, zip code here). In testing, I found it frequently returned erroneous results for this kind of string, so zip codes are a bit safer. The tradeoff is that only about 360k of the 500k total records could be geocoded. Let’s just pretend that the 140k I didn’t get are randomly distributed.

A second problem is that loading thirty to forty thousand points onto a map just ain’t gonna work when it’s being done by javascript on the client’s computer. I coarsened the map, splitting it up into about 70-mile increments and lumping projects together. The size of each circle on the map below indicates how many projects were undertaken in a particular location. I also ‘jittered’ the circles (basically moved each one in a random direction) so they did not all end up on top of each other. The colors indicate the department that put out the contract.

Federal Contracting by Agency, 2012


You can zoom in on the map using the buttons or your mouse wheel. Of course, the map is dominated by Department of Defense dots. In addition to the fact that the DOD constitutes half or more of discretionary government spending, a huge chunk of what the DOD does requires private contractors (in part due to a movement by the DOD since the 80s or so to contract out more of its operations). You hardly see any DOT dots even though DOT is a big agency, because the vast majority of its budget goes towards regulatory activities.

Another interesting field the data provides is the type of activity the contract requires. The graph below shows the number of contracts by type and year for the top 5 contract types (excuse the ugly labels).

Activity type seems pretty stable across years. The spike we see starting in 2009 is mirrored across all contract types and probably reflects stimulus spending.

Didn’t learn much from this one, but I picked up a little javascript!

Share on FacebookTweet about this on TwitterShare on LinkedIn