I’m going to try to do something quick for Andy Kriebel and Andy Cotgreave’s #makeovermonday every week so I can continue showcasing some of the different types of graphs you can make in my graphing web app Playfair and so I can identify and quash bugs! This week they chose an interactive from David McCandless’s informationisbeautiful.net showing the number of records leaked in various data breaches between 2004 and 2016. Andy gives a nice run-down of the pros and cons of the original graphic.
I made the mistake of looking at a few early entries and a major theme that seems to have struck several people is the division between data breeches that were the result of hacks and those that weren’t (including user error and lost equipment). A quick look at the data shows that the former are increasing rapidly. My first thought was that an area chart would show this trend nicely, but then I remembered that I recently implemented a variation on area charts that might be neat here. I’m actually not sure what you call this kind of chart, but it’s simply an area chart with two categories where both areas originate from y=0. The area for one category is above the x-axis and the area for the other is below it. Here’s my entry for this data set:
Why would you want to do this? I like this type of chart because it fixes one of the primary problems with area charts which is that it’s easy to read quantities for the bottom-most sector that starts at y=0, but hard to figure out the year-by-year quantities for any other category because they no longer start at 0 – you have to figure out what the top of the previous category is and subtract that from the top of the category you’re interested in.
I added a few annotations pulling out a couple of the stories, as I think these add interesting context to what hacking/negligence mean, but the focus of my graph is clearly shifted a bit: McCandless’s chart is exploratory, allowing you to look at where the data came from for each instance, whereas mine is simply summarizing one interesting trend.
Here’s the data as I entered it into Playfair:
year | type | Records | top | bottom |
---|---|---|---|---|
2004 | Hacked | 92000000 | 92 | 0 |
2005 | Hacked | 0 | 0 | 0 |
2006 | Hacked | 4000000 | 4 | 0 |
2007 | Hacked | 106100000 | 106.1 | 0 |
2008 | Hacked | 6500000 | 6.5 | 0 |
2009 | Hacked | 176521778 | 176.521778 | 0 |
2010 | Hacked | 5976400 | 5.9764 | 0 |
2011 | Hacked | 198735838 | 198.735838 | 0 |
2012 | Hacked | 453730000 | 453.73 | 0 |
2013 | Hacked | 236376000 | 236.376 | 0 |
2014 | Hacked | 364370000 | 364.37 | 0 |
2015 | Hacked | 96797000 | 96.797 | 0 |
2016 | Hacked | 457670436 | 457.670436 | 0 |
2004 | Other | 0 | 0 | 0 |
2005 | Other | 4225000 | 0 | -4.225 |
2006 | Other | 66300000 | 0 | -66.3 |
2007 | Other | 50186405 | 0 | -50.186405 |
2008 | Other | 41566500 | 0 | -41.5665 |
2009 | Other | 80290788 | 0 | -80.290788 |
2010 | Other | 10160076 | 0 | -10.160076 |
2011 | Other | 28327649 | 0 | -28.327649 |
2012 | Other | 224398792 | 0 | -224.398792 |
2013 | Other | 15926000 | 0 | -15.926 |
2014 | Other | 20128000 | 0 | -20.128 |
2015 | Other | 650000 | 0 | -0.65 |
2016 | Other | 68614105 | 0 | -68.614105 |
And here’s the variable setup in the area element tab: